Purpose: Invalidity Analysis


Patent: US8190610B2
Filed: 2006-10-05
Issued: 2012-05-29
Patent Holder: (Original Assignee) Yahoo Inc     (Current Assignee) R2 Solutions LLC ; Altaba Inc
Inventor(s): Ali Dasdan, Hung-Chih Yang, Ruey-Lung Hsiao

Title: MapReduce for distributed database processing

Abstract: An input data set is treated as a plurality of grouped sets of key/value pairs, which enhances the utility of the MapReduce programming methodology. By utilizing such a grouping, map processing can be carried out independently on two or more related but possibly heterogeneous datasets (e.g., related by being characterized by a common primary key). The intermediate results of the map processing (key/value pairs) for a particular key can be processed together in a single reduce function by applying a different iterator to intermediate values for each group. Different iterators can be arranged inside reduce functions in ways however desired.




Disclaimer: The promise of Apex Standards Pseudo Claim Charting (PCC) [ Request Form ] is not to replace expert opinion but to provide due diligence and transparency prior to high precision charting. PCC conducts aggressive mapping (based on Broadest Reasonable, Ordinary or Customary Interpretation and Multilingual Translation) between a target patent's claim elements and other documents (potential technical standard specification or prior arts in the same or across different jurisdictions), therefore allowing for a top-down, apriori evaluation, with which, stakeholders can assess standard essentiality (potential strengths) or invalidity (potential weaknesses) quickly and effectively before making complex, high-value decisions. PCC is designed to relieve initial burden of proof via an exhaustive listing of contextual semantic mapping as potential building blocks towards a litigation-ready work product. Stakeholders may then use the mapping to modify upon shortlisted PCC or identify other relevant materials in order to formulate strategy and achieve further purposes.

Click on references to view corresponding claim charts.


Non-Patent Literature        WIPO Prior Art        EP Prior Art        US Prior Art        CN Prior Art        JP Prior Art        KR Prior Art       
 
  Independent Claim

GroundReferenceOwner of the ReferenceTitleSemantic MappingBasisAnticipationChallenged Claims
123456789101112131415161718192021222324252627282930313233343536373940414243444546
1

USENIX Association Proceedings Of The Sixth Symposium On Operating Systems Design And Implementation (OSDE 04). : 137-149 2004

(Dean, 2004)
No AffiliationMapReduce: Simplified Data Processing On Large Clusters value pairs value pairs

first set, second set data sets

XXXXXXXXXX
2

CONCURRENCY-PRACTICE AND EXPERIENCE. 9 (9): 897-914 SEP 1997

(Koide, 1997)
The Japan Atomic Energy Research Institute (JAERI), The University of Electro-Communications (電気通信大学, Denki-Tsūshin Daigaku)A New Memory Allocation Method For Shared Memory Multiprocessors With Large Virtual Address Space different key memory pages

first data, first data group time t

XXXXXX
3

SPEECH COMMUNICATION. 17 (3-4): 263-271 NOV 1995

(Billi, 1995)
Centro Studi e Laboratori Telecomunicazioni (CSELT S.p.A.)INTERACTIVE VOICE TECHNOLOGY AT WORK - THE CSELT EXPERIENCE reducing step speech recognition

mapping functions broader set

XXXXXXXXXXXXXXX
4

LECTURE NOTES IN COMPUTER SCIENCE. 637: 404-425 1992

(Lam, 1992)
University of Texas, University of IllinoisOBJECT TYPE DIRECTED GARBAGE COLLECTION TO IMPROVE LOCALITY different key memory pages

different lists, different intermediate data first order

first intermediate data working set

XXXXXX
5

US20060190243A1

(Sharon Barkai, 2006)
(Original Assignee) Xeround Systems Ltd; Xeround Systems Inc     

(Current Assignee)
NORTHEND NETWORKS Ltd
Method and apparatus for data management data partitions data partitions

data groups two locations

first data, first data set one second

value pairs data items

intermediate data processing step odd number

35 U.S.C. 103(a)

35 U.S.C. 102(e)
discloses writing status parameters onto a second storage device when a first storage device fails see…

teaches a hashing unit that assigns data to virtual partitions of a database through hashing process…

teaches an interface routine that allows a client process to obtain information related to allocation of leadership…

discloses wherein the heartbeat mechanism receives periodic messages from the other servers in the cluster that indicate…
XXXXXXXXX
6

US20060031268A1

(David Shutt, 2006)
(Original Assignee) Microsoft Corp     

(Current Assignee)
Microsoft Technology Licensing LLC
Systems and methods for the repartitioning of data reducing operations readable instructions

data set, first data set said system, one second

XXXXXXXXXX
7

US20050210082A1

(David Shutt, 2005)
(Original Assignee) Microsoft Corp     

(Current Assignee)
Microsoft Technology Licensing LLC
Systems and methods for the repartitioning of data reducing operations readable instructions

data set, first data set said system

XXXXXXXXXX
8

US20060095481A1

(Ram Singh, 2006)
(Original Assignee) Microsoft Corp     

(Current Assignee)
Microsoft Technology Licensing LLC
Method and system for partition level cleanup of replication conflict metadata reducing operations executing instructions

different lists, different intermediate data rising time

first data, first data group time t

XXXXXXXXXXXXXXXX
9

US20050222980A1

(Evan Lee, 2005)
(Original Assignee) International Business Machines Corp     

(Current Assignee)
International Business Machines Corp
Fragment elimination different lists, different intermediate data coding program

first data first data

output data set n value

XXXXXX
10

US20050015546A1

(Ofir Zohar, 2005)
(Original Assignee) XIV Ltd     

(Current Assignee)
International Business Machines Corp
Data storage system different schema receiving input

mapping functions more interface

first set group number

35 U.S.C. 103(a)

35 U.S.C. 102(b)

35 U.S.C. 102(e)
teaches a method of locking a file system flushing the memory corresponding to the file system and then unlocking the…

discloses writing the file system information after flushing the cache…

teaches an asynchronous remote data mirroring system which utilizes two hosts to execute data relocation and mirroring…

discloses redirecting a write function from a storage to hold queue not related to the storage system the second…
XXXXXX
11

US20040225638A1

(Reinhold Geiselhart, 2004)
(Original Assignee) International Business Machines Corp     

(Current Assignee)
International Business Machines Corp ; Kaon Interactive Inc
Method and system for data mining in high dimensional data spaces different lists data processing program

computer system computer system

35 U.S.C. 103(a)

35 U.S.C. 102(b)

35 U.S.C. 102(e)
discloses wherein the data model includes a cohorts analysis metric operable to provide a flexible mechanism for…

teaches wherein the data page comprises clustered index leaf pages…

teaches that this present invention relates to query processing and more specifically relates to techniques for…

discloses i ndices evolve at least in part by providing subsequent users with summary comparison usage information based…
XXXXXXXXXXXXXXXXXXXXXXXXX
12

US20050177553A1

(Alexander Berger, 2005)
(Original Assignee) Microsoft Corp     

(Current Assignee)
Microsoft Technology Licensing LLC
Optimized distinct count query system and method data partitions data partitions

second set including one

XXXX
13

US20050097286A1

(Magnus Karlsson, 2005)
(Original Assignee) Hewlett Packard Development Co LP     

(Current Assignee)
Hewlett Packard Development Co LP
Method of instantiating data placement heuristic value pairs threshold limit

different key upper limit

XX
14

US20040073545A1

(Howard Greenblatt, 2004)
(Original Assignee) Metatomix Inc     

(Current Assignee)
OBJECTSTORE Inc
Methods and apparatus for identifying related nodes in a directed graph having named arcs second data set second data set

computing devices first portion

35 U.S.C. 103(a)

35 U.S.C. 102(e)

35 U.S.C. 102(b)
teaches providing contextual information on communication devices…

discloses a method for providing search results to a user further comprising generating a second subset of the first…

teaches all the claimed subject matter as discussed above with respect to claim…

discloses a cost analysis engine operably coupled to the segmentation engine for analyzing a cost of adding the…
XXX
15

US20040064454A1

(David Ross, 2004)
(Original Assignee) Raf Technology Inc     

(Current Assignee)
Matthews International Corp
Controlled-access database system and method second set, data set access rights

second data second data

first data first data

35 U.S.C. 103(a)

35 U.S.C. 102(b)

35 U.S.C. 102(e)
teaches wherein the at least one group includes a group that is controlled by a system administrator column…

teaches a tangible computerreadable medium according to claim…

teaches all the claim subject matters as discussed in claim…

teaches an access control database has access control objects that collectively store information that specifies…
XXXXXXXXXX
16

US20040230586A1

(Abel Wolman, 2004)
(Original Assignee) Abel Wolman     Geometrization for pattern recognition, data analysis, data merging, and multiple criteria decision making partitioning step second partition

data groups more elements

35 U.S.C. 103(a)

35 U.S.C. 102(e)
discloses that said listener behavior data comprises a plurality chosen from a group of a vote for a music file a vote…

discloses similarity functions involve multidimensional scaling the…

discloses that these buttons merely confirm whether the selected songs are added to the user s playlist…

teaches using a single button to determine and indicate a desired relationship measure…
XXXXXXXX
17

US20050198043A1

(Harry Gruber, 2005)
(Original Assignee) Kintera Inc     

(Current Assignee)
Kintera Inc
Database masking and privilege for organizations mapping functions level organization

computer system computer system

second set including one

particular data group more process

different schema, first schema more fields

XXXXXXXXXXXXXXXXXXXXXXXXXX
18

US20030233370A1

(Albert Barabas, 2003)
(Original Assignee) Miosoft Corp     

(Current Assignee)
Miosoft Corp
Maintaining a relationship between two different items of data includes data includes data

computer system stored data

first data, first data group time t

XXXXXXXXXXXXXXXXXXXXXXXXXXX
19

US20030105782A1

(Robert Brodersen, 2003)
(Original Assignee) Brodersen Robert A.; Prashant Chatterjee; Lim Peter S.     Partially replicated distributed database with multiple levels of remote clients second data, second intermediate data second transaction, first transaction

reduce method program codes

35 U.S.C. 103(a)

35 U.S.C. 102(e)
teaches a wireless LAN service system for storing data of a mobile terminal into a web storage using a wireless LAN…

teaches wherein the public variable interface has isolation from the server subsystem except for providing the read or…

discloses an informationserver system also a method of brokering queries which dispatches the second query to a fallback…

teaches sharing direct readwrite access to storage devices via a storage area network column…
XXXXXXXXXXX
20

US7103590B1

(Ravi Murthy, 2006)
(Original Assignee) Oracle International Corp     

(Current Assignee)
Oracle International Corp
Method and system for pipelined database table functions different schema different execution, compile time

data partitions, partitioning step based partitioning

first data, first data group repeating step, steps a

XXXXXXXXXXX
21

US20030055822A1

(Lin Yu, 2003)
(Original Assignee) Trendium Inc     

(Current Assignee)
Viavi Solutions Inc
Database systems, methods and computer program products including primary key and super key indexes for use with partitioned tables different lists, value pairs respective entity

partitioning step, data partitions second partition, first partition

second data, second data group third portion, second data

35 U.S.C. 103(a)

35 U.S.C. 102(b)

35 U.S.C. 102(e)
discloses a system in which search requests are entered in a message queue to await service…

teaches the message service is operable to detect workload patterns for the plurality of search engines and to…

discloses identifying bottlenecks and using statistics for editing database and search engines properties…

teaches a physician or healthcare provider observer can window and level control brightness and contrast as well as…
XXXXX
22

US20020049759A1

(Loren Christensen, 2002)
(Original Assignee) Loren Christensen     

(Current Assignee)
LINMOR Inc
High performance relational database management system different schema data object

value pairs high speed

35 U.S.C. 103(a)

35 U.S.C. 102(e)
teaches a computer implemented method for solving a supply chain planning problem see abstract where a resource…

teaches supply chain problems of demand forecasting see pp…

discloses placing the data in tabular format with each tabular row storing one or more performance metrics gathered…

teaches logging performance data at specified intervals and for specified durations facilitates subsequent…
XXXXXX
23

US20020116404A1

(Sang Cha, 2002)
(Original Assignee) Transact In Memory Inc     

(Current Assignee)
SAP SE ; Transact In Memory Inc
Method and system for highly-parallel logging and recovery operation in main-memory transaction processing systems second data set consistent state

particular reducer more log

35 U.S.C. 103(a)

35 U.S.C. 102(b)
discloses a database corruption recovery system embodied on a computer readable storage medium comprising means for…

teaches wherein each change document includes a time data eld indicative of a time when the change document was…

teaches wherein the first bulk delete timestamp and the second bulk delete timestamp comprise information describing…

discloses data items being multigranular items identified by resource identifiers used in a lock manager…
XXXXXX
24

US6678691B1

(Harald Kikkers, 2004)
(Original Assignee) Koninklijke KPN NV     

(Current Assignee)
ATOS ORIGIN NEDERLAND BV ; Koninklijke KPN NV
Method and system for generating corporate information intermediate data, computer system intermediate data

second schema, second set structured data

data groups system users

data set, first data set said system

particular data, particular data group one source

different schema data model

value pairs data items

XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
25

EP1032175A2

(Wojciech Gasior, 2000)
(Original Assignee) Sun Microsystems Inc     

(Current Assignee)
Sun Microsystems Inc
System and method for transferring partitioned data sets over multiple threads mapping functions corresponding port

partitioning step partitioning step

data partition data partition

first set first set

35 U.S.C. 103(a)

35 U.S.C. 102(b)

35 U.S.C. 102(e)
teaches the method of determining the performance of plurality of servers based on performance measurements such as…

teaches a similar control system wherein the external control device comprises a storage duration management device…

discloses selecting the objects from the palette includes dragging the objects from the palette to the window ieThe…

discloses A content distribution method comprising sending distribution requests by a client apparatus to a plurality of…
XXXXX
26

CN1245936A

(陈惠嫈, 2000)
(Original Assignee) Panasonic Corp     

(Current Assignee)
Panasonic Corp
固定格式文字处理方法与装置 first intermediate data set, intermediate data set 指定的一个

output data groups, output data set 一个输入

different key 输入命令

combine task 产生关联

first data, first data group 第二个

second set 接收一

35 U.S.C. 103(a)

35 U.S.C. 102(b)
discloses that the data stored in the tables in the relational databases is stored in a standardized format…

teaches the schema is based upon a type and parameters of the columns for DB table see paragraph…

teaches a system that maps object classes in an objectoriented environment to a data source including relationships…

teaches multiple virtual tables correspond to different schemas see paragraph…
XXXXXXXXXXXXXXXXXX
27

EP1040434A1

(Linda G. Demichiel, 2000)
(Original Assignee) Linda G. Demichiel; Roderic G. G. Cattell     

(Current Assignee)
Sun Microsystems Inc
Methods and apparatus for efficiently splitting query execution across client and server in an object-relational mapping reducing operations receiving step

first data first data

second set second set

first set first set

35 U.S.C. 103(a)

35 U.S.C. 102(b)
teaches that file size may be specified for the files in a given virtual folder this is therefore a sizing property of…

teaches automatically partitioning groups of files into categories and when the of files in a category exceeds a…

discloses a method for operating a system for assisting a user in locating particular content of interest from a…

teaches all the claimed subject matter as discussed above with respect to claim…
XXXX
28

CN1211769A

(黄永成, 1999)
(Original Assignee) Chinese University of Hong Kong CUHK     

(Current Assignee)
Chinese University of Hong Kong CUHK
基于贝叶斯网络的用于文件检索的方法和设备 computing devices 计算机中

corresponding different intermediate data 通过分析

first data, first data group 第二个

value pairs 最大值

35 U.S.C. 103(a)

35 U.S.C. 102(b)

35 U.S.C. 102(e)
discloses generating global node rank information and generating keywordspecific node rank information using the global…

discloses wherein the combining the multiple initial rankings comprises combining based on user definable adjusting…

teaches a method for grouping data objects to improve data analysis…

teaches a process for creating and displaying a publication historiograph…
XXXX
29

US6158044A

(John J. Tibbetts, 2000)
(Original Assignee) ePropose Inc     

(Current Assignee)
ePropose Inc
Proposal based architecture system different intermediate data User Interface

includes data includes data

particular reducer rule base

35 U.S.C. 103(a)

35 U.S.C. 102(b)
teaches the limitation wherein the transaction data is associated with a metadata that can be used to recover the…

teaches a method for providing a contract services in a framework comprising the steps of matching a user to a service…

teaches wherein a plurality of query operations are performed in parallel using the distributed memory see paragraph…

teaches shifting the first level of the resultant twolevel data structure to the left by a specified of bits and then…
XXXXXX
30

EP0760500A1

(Billy J. Fuller, 1997)
(Original Assignee) Sun Microsystems Inc     

(Current Assignee)
Sun Microsystems Inc
Partitioning within a partition in a disk file storage system computer system computer system

first set different one

35 U.S.C. 103(a)

35 U.S.C. 102(e)
teaches an example where eight LBA storage slots of data blocks are mapped to each physical block within a data…

teaches of identifying a list of one or more other volumes in a copyset wherein the copyset includes the volume to be…

teaches a recording medium wherein the recording mode is determined to be one of a sequential recording mode and a…

discloses orderbillings information storing on a recording medium see…
XXXXXXXXXXXXXXXXXXXXXXXX
31

EP0829049A2

(Öystein TORBJÖRNSEN, 1998)
(Original Assignee) Telenor ASA     

(Current Assignee)
Clustra Systems Inc
Continuously available database server having multiple groups of nodes with minimum intersecting sets of database fragment replicas mapping functions different cooling

computer system computer system, different power

first set different one

35 U.S.C. 103(a)

35 U.S.C. 102(b)
discloses an independent redundant third computer and wherein the second computer is con gured to match with the third…

discloses that transaction records are stored in a database at a primary system and also in a duplicate database in a…

teaches recovery in a distributed computing environment see…

discloses accessing the database through an application program interface see column…
XXXXXXXXXXXXXXXXXXXXXXXXX
32

JPH07319923A

(G Stellwagen Richard Jr, 1995)
(Original Assignee) At & T Global Inf Solutions Internatl Inc; エイ・ティ・アンド・ティ グローバル インフォメーション ソルーションズ インターナショナル インコーポレイテッド     マルチプロセッサコンピュータシステムの並行データベースを処理するための方法および装置 reduce method 少なくとも, システム

processing data コンピュ

35 U.S.C. 103(a)

35 U.S.C. 102(b)
teaches that the present invention provides a phrase recognition method which breaks text into text chunks and selects…

teaches a method for automatic web page thumbnail generation comprising receiving keyword col…

teaches searching by certain keywords he does not explicitly teach that said searching includes searching an index of…

teaches that a database system may be partitioned in order to query poll over multiple servers simultaneously…
XXXX
33

EP0692121A1

(Mark Squibb, 1996)
(Original Assignee) Squibb Data Systems Inc     

(Current Assignee)
Squibb Data Systems Inc
File difference engine second set, second data set respective segments

second data second data, said memory

data set, value pairs hash table

first data first data

different lists said list

data partitions one file

35 U.S.C. 103(a)

35 U.S.C. 102(e)

35 U.S.C. 102(b)
discloses that access rights can be individually assigned to files pages…

teaches receiving from a remote machine metadata associated with each file comprising the stored application program…

teaches a system and method wherein proxy servers in two or more geographically remote LANs communicate with each…

discloses all aspects of the claimed invention except communication system supports the…
XXXXXXXXXXXX
34

JPH07141394A

(Kazuo Masai, 1995)
(Original Assignee) Hitachi Ltd; 株式会社日立製作所     データベース分割管理方法および並列データベースシステム computer system 行うこと

different key のキー

35 U.S.C. 103(a)

35 U.S.C. 102(e)

35 U.S.C. 102(b)
discloses wherein the data model includes a cohorts analysis metric operable to provide a flexible mechanism for…

teaches wherein the data page comprises clustered index leaf pages…

teaches generating a rank value for each keyword the higher the rank value the top position the keyword will list and…

teaches a method for determining assignees related by common cited references with a source patent portfolio…
XXXXXXXXXXXXXXXXXXXXXXXXX
35

JPH0698770A

(Andrea Califano, 1994)
(Original Assignee) Internatl Business Mach Corp <Ibm>; インターナショナル・ビジネス・マシーンズ・コーポレイション     トークン列データベースにおけるトークンシーケンスの探索 second set 有する第1

processing data コンピュ

reduce method システム

XXXXX
36

EP0583559A1

(Andrea Califano, 1994)
(Original Assignee) International Business Machines Corp     

(Current Assignee)
International Business Machines Corp
Finding token sequences in a database of token strings includes data contiguous amino acids

reducing operations different lengths

computer system computer system

first data similar manner

data set, value pairs hash table

XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
37

CA2600344A1

(Mark Shull, 2006)
(Original Assignee) Markmonitor Inc.; Mark Shull; William Bohlman; Ihab Shraim; Christopher J. Bura; Markmonitor, Inc.     

(Current Assignee)
MarkMonitor Inc
Distribution of trust data first data, first data set communicatively couple, first data

processing data cache server

second data second data

second set second set

first set first set

35 U.S.C. 103(a)

35 U.S.C. 102(b)

35 U.S.C. 102(e)
teaches information gathered externally from multiple sources including proxy information and data mining internet…

teaches the processing comprises evaluating one or more parameters selected from among the group consisting of URL…

teaches the claimed subject matter a computer implemented method of certifying a user identity ID see claim…

discloses domain controlling systems methods and computer program products for administration of computer security…
XXXXXXXXXXXX
38

JP2006155663A

(Jean A Marquis, 2006)
(Original Assignee) Sand Technology Systems Internatl Inc; サンド テクノロジー システムズ インターナショナル,インコーポレイティド     最大ビットスライスを用いてビットストリングにブール演算を施すための方法とシステム partitioning step 施すステップ

reduce method 少なくとも

processing data コンピュ

different schema メモリ

XXXXXXXXX
39

US20060200253A1

(Steven Hoffberg, 2006)
(Original Assignee) STEVEN M HOFFBERG 2004-1 GRAT     

(Current Assignee)
HOFFBERG FAMILY TRUST 1 ; STEVEN M HOFFBERG 2004-1 GRAT ; Blanding Hovenweep LLC
Internet appliance system and method computing devices physical security

second data, second data group signal processor, digital audio

data groups automated device

first data, first data set one second, first data

includes data real time

35 U.S.C. 103(a)

35 U.S.C. 102(e)

35 U.S.C. 102(b)
teaches the webbased control page for displaying on a browser for a user interaction with the selected device…

discloses that a preferred viewing schedule for a viewing can be constructed containing programs of interest to the…

discloses maintaining user profile comprising sequence of program categories col…

discloses wherein selecting advertisements for user sessions associated with the user identifier based on the user…
XXXXXXXXXXXXXXXXXX
40

US20060173926A1

(Kevin Kornelson, 2006)
(Original Assignee) Microsoft Corp     

(Current Assignee)
Microsoft Technology Licensing LLC
Data transformation to maintain detailed user information in a data warehouse includes data collection system

different key, groups having different schema different key, enable access

data set, first data set said system

35 U.S.C. 103(a)

35 U.S.C. 102(e)

35 U.S.C. 102(b)
teaches the invention substantially as claimed as noted above…

discloses applying a hash function to a concatenation of said block identifier and an identifier of a respective one of…

discloses tag information that indicates coordinate information about print data indicating a page orientation…

discloses all aspects of the claimed invention except communication system supports the…
XXXXXXXXXXXXXXXX
41

US20060206507A1

(Ziyad Dahbour, 2006)
(Original Assignee) Dahbour Ziyad M     Hierarchal data management data partitions different partition

first data, first data group first data, time t

providing metadata meta data

XXXXXXX
42

WO2006060773A2

(Patrick Hanrahan, 2006)
(Original Assignee) Tableau Software Llc     Computer systems and methods for visualizing data with generation of marks different schema, first schema different fields, more fields

second schema, second set structured data

second data, second data group said time

XXXXXX
43

CN1761203A

(李生红, 2006)
(Original Assignee) Shanghai Jiaotong University     

(Current Assignee)
Shanghai Jiaotong University
网上信息安全综合分析与监控系统 partitioning step 进一步

s corresponding data partition to form corresponding intermediate data 别模块

XXX
44

WO2006050349A2

(Fabricio Alves Barbosa Da Silva, 2006)
(Original Assignee) Hewlett-Packard Development Company, L.P.     Methods and apparatus for running applications on computer grids second set, reduce method following steps

includes data computing unit

different lists ordered list

first set, data set input file

35 U.S.C. 103(a)

35 U.S.C. 102(e)
teaches all aspects of the claimed invention with regard to claims…

teaches a method computer program computer readable medium and computer data signal of claims…

describes that the CPU may input data to the GPU pipeline see FIG…

describes a graphics processor configured to perform at least one video processing operation…
XXXXXXXXXXXXXX
45

US20050273730A1

(Stuart Card, 2005)
(Original Assignee) Card Stuart K; Nation David A     System and method for browsing hierarchically based node-link structures based on an estimated degree of interest first data, first data group repeating step, said sub

particular data linked data

second set, first set second set, more sets

value pairs data items

XXXXXXXXXXXX
46

WO2006019752A1

(Christopher Lunt, 2006)
(Original Assignee) Friendster, Inc.     Methods for authorizing transmission of content from first to second individual and authentication an individual based on an individual’s social network different schema receiving input

second set second set

35 U.S.C. 103(a)

35 U.S.C. 102(b)

35 U.S.C. 102(e)
teaches a news ticker interface presenting a plurality of stories including information describing a plurality of…

teaches a method comprising receiving at a social networking system a request from a target user for news ticker…

teaches that the social network database may be physically attached with the social network engine col…

discloses this limitation in that the GUI enables a user to select particular types of recent content publication…
XXXXXX
47

JP2006018843A

(Nicole A Hamilton, 2006)
(Original Assignee) Microsoft Corp; マイクロソフト コーポレーション     ページカテゴリ情報の使用による検索エンジン結果の分散 not intermediate data 含むコンピュータ

selected one 各カテゴリ

reduce method 少なくとも

data group, first data group えること

second intermediate data ファイル

first set, second set アップ

35 U.S.C. 103(a)

35 U.S.C. 102(e)

35 U.S.C. 102(b)
discloses the method of scoring an automotive retailer according to claim…

discloses in the summary means for elevating certain nodes within a tree such that nodes that are more likely to be…

discloses storing a list of products based on popularity column…

teaches wherein the selection highlight indicator includes a highlighted vertical portion of the map and further…
XXXXXXXXXXXXXXXXXXXXXXXXXXXX
48

JP2006020310A

(Gyung-Pyo Hong, 2006)
(Original Assignee) Samsung Electronics Co Ltd; 三星電子株式会社Samsung Electronics Co.,Ltd.     セクションデータフィルタリング方法及び装置 corresponding different intermediate data のプログラム

data group, first data group えること

processing data コンピュ

XXXXXXXXXXXXXXXXXX
49

US20060218123A1

(Sudipto Chowdhuri, 2006)
(Original Assignee) Sybase Inc     

(Current Assignee)
Sybase Inc
System and Methodology for Parallel Query Optimization Using Semantic-Based Partitioning partitioning step partitioning step

processing data processing data

data partitions data partitions

second set including one

second data said memory

XXXXXXX
50

US20060004851A1

(Steven Gold, 2006)
(Original Assignee) GraphLogic Inc     

(Current Assignee)
GraphLogic Inc
Object process graph relational database interface reducing step software product

second data second data

different schema data object

first data first data

35 U.S.C. 103(a)

35 U.S.C. 102(b)

35 U.S.C. 102(e)
teaches that file size may be specified for the files in a given virtual folder this is therefore a sizing property of…

discloses indexing media content on the internet comprising a mediax file containing a hierarchy of metadata however…

teaches automatically partitioning groups of files into categories and when the of files in a category exceeds a…

teaches the claimed dynamically instantiating and assembling a H components of a web page to create at least one…
XXXXXXXXXXXXXXXX
51

US20060036568A1

(Jason Moore, 2006)
(Original Assignee) Microsoft Corp     

(Current Assignee)
Microsoft Technology Licensing LLC
File system shell reducing operations user provides input

first data group, first data set functional modules

different lists corresponding item, said list

first data first data

second set second set

XXXX
52

JP2005353039A

(Yu Chen, 2005)
(Original Assignee) Microsoft Corp; マイクロソフト コーポレーション     データオーバーレイ、自己編成メタデータオーバーレイおよびアプリケーションレベルマルチキャスティング data groups, output data groups なるグループ

corresponding different intermediate data 有する単一

computer system 行うこと

first set, second set アップ

35 U.S.C. 103(a)

35 U.S.C. 102(e)

35 U.S.C. 102(b)
discloses a method for managing remote resource visibility in a proxy server environment as discussed in claim…

discloses the current operational state of the local peer node is used to determine whether to advertise or withdraw…

teaches a method for delivering published information a schemabased contacts service for…

teaches a method for management of a wireless communication network having a plurality of access nodes the method…
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
53

WO2005098652A2

(Alok Batra, 2005)
(Original Assignee) Cxo Systems, Inc.     Providing enterprise information different intermediate data further process

first set different one

XXXX
54

EP1566752A2

(Asta J. Roseway, 2005)
(Original Assignee) Microsoft Corp     

(Current Assignee)
Microsoft Corp
Rapid visual sorting of digital files and data data set respective sets, two sets

computer system computer system

35 U.S.C. 103(a)

35 U.S.C. 102(b)
discloses a main sports genre and a subgenre of an event game such as the…

teaches a plurality of receivers to which contents are input and playing contents which are input through a selected…

discloses a similar method for classifying photographs that further discloses a people classification category and a…

teaches of wherein the specific event is selected from a group comprising an…
XXXXXXXXXXXXXXXXXXXXXXXXXXXX
55

JP2006195890A

(Masaru Fukami, 2006)
(Original Assignee) Fuji Xerox Co Ltd; 富士ゼロックス株式会社     情報処理装置、システム、データ同期方法及びプログラム different schema 前記第2グループ, 要求手段

processing data コンピュ

reduce method システム

XXXXXXXX
56

JP2005235171A

(Carl D'halluin, 2005)
(Original Assignee) Emc Corp; イーエムシー コーポレイションEmc Corporation     時間的に近接して記憶システムに書き込まれたデータユニットを示すコンテンツアドレスの生成方法およびその装置 not intermediate data 含むコンピュータ

data group, first data group えること

XXXXXXXXXXXXXXXXXXXXXX
57

JP2006041764A

(Fumihiro Umetsu, 2006)
(Original Assignee) Ricoh Co Ltd; 株式会社リコー     ログ記録装置、ログ記録プログラムおよび記録媒体 corresponding different intermediate data のプログラム

second intermediate data ファイル

processing data コンピュ

first set, second set アップ

XXXXXXXXXXXX
58

US20050050030A1

(Hakon Gudbjartsson, 2005)
(Original Assignee) Decode Genetics EHF     

(Current Assignee)
Decode Genetics EHF
Set definition language for relational data computer system computer system

second set including one

data set data set

XXXXXXXXXXXXXXXXXXXXXXXXXXXX
59

US20060190195A1

(Tatsuhisa Watanabe, 2006)
(Original Assignee) Kochi University NUC; A&T Corp     

(Current Assignee)
Kochi University NUC ; A&T Corp
Clinical examination analyzing device, clinical examination analyzing method, and program for allowing computer to execute the method s corresponding data partition to form corresponding intermediate data, includes data calculating unit

different key different kind

computing devices waveform data

35 U.S.C. 103(a) discloses a method of determining whether sufficient amounts of sample liquid have been evenly applied to an application…XXXXXXXXXX
60

EP1498815A2

(Bruce D. Holenstein, 2005)
(Original Assignee) Gravic Inc     

(Current Assignee)
Gravic Inc
Methods for ensuring referential integrity in multi-threaded replication engines value pairs threshold limit

first data, first data group repeating step

output data groups, second data set elapsed time

35 U.S.C. 103(a)

35 U.S.C. 102(b)

35 U.S.C. 102(e)
teaches transferring files to a remote disaster recovery site…

discloses efficient restoration because a copy of files is a primary means of restoring data rather than a transaction…

discloses a file system monitoring method is used to determine whether files have changed so that a minimum set of files…

teaches that this limitation was known in the art at paragraph…
XXXXXX
61

US20050187897A1

(Deepak Pawar, 2005)
(Original Assignee) Microsoft Corp     

(Current Assignee)
Microsoft Technology Licensing LLC
System and method for switching a data partition reducing operations comprises instructions

data group, first data group temporary storage

value pair second portions

computing devices first portion

partitioning step same index

XXXXXXXXXXXXXXXXXXXXXXX
62

JP2005018751A

(Amir Netz, 2005)
(Original Assignee) Microsoft Corp; マイクロソフト コーポレーション     測度間の関係を表現及び計算するシステム及び方法 reduce method システム

computer system 可読媒体

different schema メモリ

XXXXXXXXXXXXXXXXXXXXXXXXXX
63

US20050283679A1

(Thomas Heller, 2005)
(Original Assignee) International Business Machines Corp     

(Current Assignee)
Microsoft Technology Licensing LLC
Method, system, and computer program product for dynamically managing power in microprocessor chips according to present processing demands second set including one, said server

data set, first data set said system

35 U.S.C. 103(a)

35 U.S.C. 102(e)
discloses wherein the first virtual storage system and the second virtual storage system are coupled via a network…

discloses wherein when the computer receives a first instruction from the management terminal the first instruction…

teaches receiving a request to instantiate a virtual machine as at the beginning of a time period the hypervisor may…

teaches the second kind physical processor has a lower maximum operating frequency than the first kind physical…
XXXXXXXXXX
64

US20040215640A1

(Roger Bamford, 2004)
(Original Assignee) Oracle International Corp     

(Current Assignee)
Oracle International Corp
Parallel recovery by non-failed nodes particular data particular data

second set second set

first set first set

XXXX
65

CN1778089A

(W·F·J·方蒂恩, 2006)
(Original Assignee) Koninklijke Philips NV     

(Current Assignee)
Koninklijke Philips NV
内容的对等传输 data group, second data group 包括程序

computing devices 个人计算, 计算机程

partitioning step 进一步

35 U.S.C. 103(a)

35 U.S.C. 102(b)

35 U.S.C. 102(e)
discloses all of the claimed subject matter except that the current work is an audio work and the step of inserting…

teaches the list of files provided included the file name paragraph…

discloses a user device capable of displaying the search results over mobile devices such as PDAs notepads laptops etc…

teaches wherein the plurality of software agents masquerade as nodes in a plurality of decentralized networks for…
XXXXXXXXXXXXXXXXXXX
66

CN1781105A

(拉维·默西, 2006)
(Original Assignee) Oracle International Corp     

(Current Assignee)
Oracle International Corp ; Oracle America Inc
在xml文档和关系数据之间的映射中保留层次信息 partitioning step 进一步

computing devices 来计算

XXX
67

CN1534518A

(C・纳拉亚南, 2004)
(Original Assignee) Microsoft Corp     

(Current Assignee)
Microsoft Corp
在应用定义的系统中一致性单元的复制 first data 通过下列

computing devices 一种计算

XXXX
68

US20050216428A1

(Yuichi Yagawa, 2005)
(Original Assignee) Hitachi Ltd     

(Current Assignee)
Hitachi Ltd
Distributed data management system groups having different schema more candidate

first data, first data set one second

XXXX
69

JP2005267301A

(Nobuo Kawamura, 2005)
(Original Assignee) Hitachi Ltd; 株式会社日立製作所     ログ同期dbデータ非同期転送によるリカバリ方式および装置 processing data コンピュ

computer system 行うこと

35 U.S.C. 103(a)

35 U.S.C. 102(b)
describes that in certain embodiments a method comprises processing with one or more routines at least one log file…

describes means for quiescing at a known good state of the computer application and means for recording a time stamp…

discloses access of data in storage array at a point in time…

discloses a buffer processing portion that relays data for the task command and data for the operation result…
XXXXXXXXXXXXXXXXXXXXXXXXXX
70

JP2004303212A

(Thomas P Conlon, 2004)
(Original Assignee) Microsoft Corp; マイクロソフト コーポレーション     変形olapを使用する先行キャッシュ・システムおよび方法 associated metadata ユーザ開始動作

output data set パラメータ

processing data コンピュ

computer system 可読媒体

35 U.S.C. 103(a)

35 U.S.C. 102(e)
discloses a method in which a data entry is tagged with the corresponding data header column…

teaches all of the claimed subject matter as discussed above with respect to claim…

teaches a multidimensional database and storing multidimensional data in a relational database and subdividing the…

teaches converting tagged XML input data containing a child data items within the tags of parents ie elements and data…
XXXXXXXXXXXXXXXXXXXXXXXXXXXXX
71

EP1564654A1

(David Paul Yach, 2005)
(Original Assignee) Research in Motion Ltd     

(Current Assignee)
BlackBerry Ltd
Apparatus and method for determining synchronization status of database copies connected by a radio air interface of a radio communication system s corresponding data partition selected portions

data partition mobile node

output data set n value

35 U.S.C. 103(a)

35 U.S.C. 102(e)

35 U.S.C. 102(b)
discloses the system for providing information to the mobile client of claim…

teaches the memory storing the status report for a predefined length of time after the status report is transmitted to…

teaches a receiver for receiving positioning data from satellites allowing the processor to use the positioning data…

teaches a communication system comprising A mobile unit having a processor a memory and a wireless modem for…
XXXXXX
72

EP1564658A1

(David Paul Yach, 2005)
(Original Assignee) Research in Motion Ltd     

(Current Assignee)
BlackBerry Ltd
Apparatus and associated method for synchronizing databases by comparing hash values. data partition, data group information representative

mapping functions corresponding port

XXXXXXXXXXXXXXXX
73

JP2005196602A

(Shinji Fujiwara, 2005)
(Original Assignee) Hitachi Ltd; 株式会社日立製作所     無共有型データベース管理システムにおけるシステム構成変更方法 corresponding different intermediate data のプログラム

reduce method 少なくとも

second intermediate data ファイル

computer system 行うこと

35 U.S.C. 103(a)

35 U.S.C. 102(b)

35 U.S.C. 102(e)
teaches about a preference for how the collected data may be used…

teaches about comparing a list of marketing targets of the marketing process against the customer preference in the…

discloses wherein the data model includes a cohorts analysis metric operable to provide a flexible mechanism for…

teaches a method and system for tokenless authorization of commercial transactions where a buyer registers with a…
XXXXXXXXXXXXXXXXXXXXXXXXXXXX
74

US20040181461A1

(Samir Raiyani, 2004)
(Original Assignee) SAP SE     

(Current Assignee)
SAP SE
Multi-modal sales applications different schema Extensible Markup Language, receiving input

first data group additional input

computer system stored data

second set second set

first set first set

35 U.S.C. 103(a)

35 U.S.C. 102(e)

35 U.S.C. 102(b)
teaches that an alternate travel product being upsold is related to the preferred travel product by at least one…

discloses wherein the viewer is able to judge regardless of a route of acquisition corresponding to the of times of…

discloses a picture information displaying method for displaying on a display picture the information pertinent to…

discloses methods for use by a server for configuring a network between users in a communication system the method…
XXXXXXXXXXXXXXXXXXXXXXXXXX
75

JP2005135317A

(Katsuhiko Takachio, 2005)
(Original Assignee) Toshiba Solutions Corp; 東芝ソリューション株式会社     文書管理システムおよび文書管理プログラム processing data コンピュ

selected one スタイル

XXXXXX
76

US20040111410A1

(David Burgoon, 2004)
(Original Assignee) Battelle Memorial Institute Inc     

(Current Assignee)
Battelle Memorial Institute Inc
Information reservoir different schema original query

s corresponding data partition to form corresponding intermediate data, corresponding intermediate data sampling rate

different key random number

includes data includes data, real time

data set, first data set said system

partitioning step, intermediate data processing step above steps

second set second set

first set first set

first data said sub

particular data group n points

output data set n value

XXXXXXXXXXXXXXXXXXXXXX
77

WO2005013139A1

(Nam-Yul Lee, 2005)
(Original Assignee) Nitgen Technologies Inc.     A contents synchronization system in network environment and a method therefor mapping functions management function, encryption function

particular data predetermined times

different intermediate data User Interface

combine task current status

processing data, intermediate data processing other region

data set, first data set said system

second set said server

first data, first data group time t

35 U.S.C. 103(a)

35 U.S.C. 102(e)

35 U.S.C. 102(b)
discloses hostnames for data including a domain name and value indicating information about the object col…

teaches a media element library that is able to return the status of robots which includes the availability of robots…

teaches the query can be selected using a leastrecently used algorithm see…

discloses a data transfer system in which a plurality of sites are connected over a network ie see at least…
XXXXXXXXXXXXXXXXXX
78

JP2005099107A

(Hiroto Endo, 2005)
(Original Assignee) Matsushita Electric Ind Co Ltd; 松下電器産業株式会社     データ再生装置 first set, first data set 選択手段

reducing operations 動作状態

different schema メモリ

XXXXXX
79

JP2005092707A

(Atsuji Nagahara, 2005)
(Original Assignee) Seiko Epson Corp; セイコーエプソン株式会社     類似度算出システムおよび類似度算出プログラム、並びに類似度算出方法 corresponding different intermediate data のプログラム

data group, first data group えること

processing data コンピュ

XXXXXXXXXXXXXXXXXX
80

US20040117345A1

(Roger Bamford, 2004)
(Original Assignee) Oracle International Corp     

(Current Assignee)
Oracle International Corp
Ownership reassignment in a shared-nothing database system data partitions, partitioning step based partitioning

different schema persistent data

computer system stored data

first data, first data group first data, time t

second set second set

first set first set

XXXXXXXXXXXXXXXXXXXXXXXXXXXXX
81

WO2004028108A2

(Jean-Marie Vau, 2004)
(Original Assignee) Eastman Kodak Company     Method for archiving multimedia messages second set, reduce method following steps

reducing step first terminal

second data second data

first data first data

35 U.S.C. 103(a)

35 U.S.C. 102(b)

35 U.S.C. 102(e)
discloses every limitation claimed as applied above see claim…

discloses further including a select instruction associated with the instruction eld for printing a plurality of text…

discloses wherein receiving the request for telecommunications service comprises receiving a user…

discloses the feature exchanging information over a control channel see col…
XXXXXXXXXXXXXXX
82

EP1511232A1

(Miguel De Vega Rodrigo, 2005)
(Original Assignee) Siemens AG; Nokia Siemens Networks GmbH and Co KG     

(Current Assignee)
Nokia Solutions and Networks GmbH and Co KG
A method for transmission of data packets through a network second data, second data group said time

first data, first data group time t

35 U.S.C. 103(a)

35 U.S.C. 102(e)
teaches that bandwidth is derived in accordance with an average aggregate channel capacity see col…

teaches the one or more sources are a plurality of sources eg see…

discloses a weight factor the utility function is equivalent to a weight factor for example see figures…

teaches all the claimed limitations as previously discussed with respect to claim…
XXXX
83

JP2005025303A

(Yukio Nakano, 2005)
(Original Assignee) Hitachi Ltd; 株式会社日立製作所     データベース分割格納管理装置、方法及びプログラム corresponding different intermediate data のプログラム

computer system 行うこと

processing data コンピュ

data groups, output data groups NAS

XXXXXXXXXXXXXXXXXXXXXXXXXXXX
84

US20040036716A1

(Jena Jordahl, 2004)
(Original Assignee) Jordahl Jena J.     

(Current Assignee)
GLOBAL CONNECT TECHNOLOGY Inc
Data storage, retrieval, manipulation and display tools enabling multiple hierarchical points of view first data group weighting function

first set, second set confidence levels, data sets

35 U.S.C. 103(a)

35 U.S.C. 102(b)

35 U.S.C. 102(e)
discloses a data normalization layer for normalizing data types from multiple data sources and a data cleansing layer…

discloses wherein the data model includes a cohorts analysis metric operable to provide a flexible mechanism for…

discloses a method and apparatus for facilitating and controlling a buyer driven market where prospective buyers of…

teaches wherein the data page comprises clustered index leaf pages…
XXXXXXXXXX
85

JP2004362449A

(Yuji Aoki, 2004)
(Original Assignee) Mitsubishi Electric Corp; 三菱電機株式会社     サービス提供装置及びサービスコーディネータ装置及びサービス提供方法及びサービスコーディネート方法及びプログラム及びプログラムを記録したコンピュータ読み取り可能な記録媒体 corresponding different intermediate data のプログラム

reduce method 少なくとも

processing data コンピュ

second intermediate data, second intermediate data set 記憶部

XXXXXXXXXXXX
86

US20040249789A1

(Rahul Kapoor, 2004)
(Original Assignee) Microsoft Corp     

(Current Assignee)
Microsoft Technology Licensing LLC
Duplicate data elimination system output data set more data records

computer system evaluation data

35 U.S.C. 103(a)

35 U.S.C. 102(b)

35 U.S.C. 102(e)
teaches allowing a user to search music based on music title…

discloses the method the system and the computer readable medium according to claim…

teaches the parameters of the scoring function are learned by a machine learning algorithm…

discloses a method for providing search results to a user further comprising generating a second subset of the first…
XXXXXXXXXXXXXXXXXXXXXXXXX
87

US20030208468A1

(David McNab, 2003)
(Original Assignee) EXCHANGE SYNERGISM Ltd     

(Current Assignee)
Objective Business Services Inc
Method, system and apparatus for measuring and analyzing customer business volume computer system computer system

includes data includes data

output data set n value

35 U.S.C. 103(a)

35 U.S.C. 102(e)

35 U.S.C. 102(b)
discloses a networked personal contact manager title comprising address books for multiple users where users can link to…

teaches displaying an email address as a unique string col…

teaches the claimed subject matter as discussed above except that…

teaches personality assessment data for the particular entity the behavior assessment data for the particular entity…
XXXXXXXXXXXXXXXXXXXXXXXXXXX
88

US20040199533A1

(Pedro Celis, 2004)
(Original Assignee) Microsoft Corp     

(Current Assignee)
Microsoft Technology Licensing LLC
Associative hash partitioning second set specific attribute

data set, first data set said system

35 U.S.C. 103(a)

35 U.S.C. 102(b)

35 U.S.C. 102(e)
discloses wherein the data model includes a cohorts analysis metric operable to provide a flexible mechanism for…

discloses the requested first plurality of updates comprising deleting the first subset of the first data packages…

teaches wherein the data page comprises clustered index leaf pages…

teaches wherein processing an intermediate node includes processing or skipping any data elements in the next level…
XXXXXXXXXX
89

US20030227924A1

(Muralidharan Kodialam, 2003)
(Original Assignee) Nokia of America Corp     

(Current Assignee)
Nokia of America Corp
Capacity allocation for networks having path length routing constraints first data, first data group repeating step

data groups shortest path

second set second set

first set first set

35 U.S.C. 103(a)

35 U.S.C. 102(e)
discloses in conjunction with the sorting matching together those of the table formation patterns that are similar for…

describes a method for cost determination for paths between switches in a mesh fig…

teaches the claimed feature of establishing by the one or more computing devices a second communication channel as a…

teaches a method of operating a service provider system ISP the method comprising…
XXXXXXXX
90

US6990480B1

(F. N. Burt, 2006)
(Original Assignee) Trancept Ltd     

(Current Assignee)
Trancept Ltd
Information manager method and system data groups respective value

second set, reduce method following steps

different lists given set

output data set n value

35 U.S.C. 103(a)

35 U.S.C. 102(e)
teaches the use of a spreadsheet template to make changes to spreadsheet programs…

discloses using macros to manipulate the format of the spreadsheet application…

teaches further comprising at least one identifier ID element respectively indicative of the at least one column…

teaches a calculated field and a script for performing the calculations col…
XXXXXXXXXX
91

JP2004226214A

(Hidenori Maeda, 2004)
(Original Assignee) Inkurimento P Kk; インクリメント・ピー株式会社     地図情報処理装置、そのシステム、その方法、そのプログラム、および、そのプログラムを記録した記録媒体 s corresponding data partition to form corresponding intermediate data, corresponding intermediate data 有し一対

reduce method システム

XXXXXX
92

CN1517906A

(鹏 张, 2004)
(Original Assignee) Lenovo Beijing Ltd     

(Current Assignee)
Lenovo Beijing Ltd
文件系统及文件管理方法 first schema 逻辑组合

different schema 数据文件

first data, first data group 第二个

partitioning step 进一步

XXXXXXX
93

US20040122803A1

(Byron Dom, 2004)
(Original Assignee) International Business Machines Corp     

(Current Assignee)
International Business Machines Corp
Detect and qualify relationships between people and find the best path through the resulting social network different lists list information

data group last access

XXXXXXXXXXXXXXXX
94

JP2004178253A

(Kenji Ishii, 2004)
(Original Assignee) Hitachi Ltd; 株式会社日立製作所     記憶デバイス制御装置および記憶デバイス制御装置の制御方法 reduce method 少なくとも

data group, first data group えること

providing metadata 値以下

XXXXXXXXXXXXXXXXXXX
95

US20030149934A1

(Robert Worden, 2003)
(Original Assignee) CHARTERIS PLC     

(Current Assignee)
CHARTERIS PLC
Computer program connecting the structure of a xml document to its underlying meaning second set, reduce method following steps

mapping functions same function

reducing step high level

first data, first data group time t

XXXXXXXXXXXXXXX
96

US20040088147A1

(Qian Wang, 2004)
(Original Assignee) Hewlett Packard Development Co LP     

(Current Assignee)
Valtrus Innovations Ltd ; Hewlett Packard Enterprise Development LP
Global data placement processing data, computing devices bandwidth constraint, first edge

second set including one

different key upper limit

first intermediate data set  ∑

output data set n value

35 U.S.C. 103(a)

35 U.S.C. 102(e)

35 U.S.C. 102(b)
discloses wherein said at least one performability parameter comprises at least one selected from the group consisting…

discloses the transmission of nal encoded video image data to a settop box on the receiver end of a user via television…

teaches the method comprising monitoring requests for content at a first cache component over time to dynamically…

teaches distributing the service request to the plurality of service servers see col…
XXXXXXXXXX
97

US7047253B1

(Ravi Murthy, 2006)
(Original Assignee) Oracle International Corp     

(Current Assignee)
Oracle International Corp
Mechanisms for storing content and properties of hierarchically organized resources different schema receiving input

providing metadata stores metadata

mapping functions have values

first data first data

XXXXXXXXX
98

EP1294144A2

(Steve M. Simmons, 2003)
(Original Assignee) Chiaro Networks Ltd     

(Current Assignee)
CHIARO NETWORKS LTD. ; Chiaro Networks Ltd
System and method for router data distribution second set said server

particular data, particular data group one source

second data, second data group said time

35 U.S.C. 103(a)

35 U.S.C. 102(e)
discloses of the retransmission of the information requested by the retransmission request depending on the multicast…

teaches wherein each of said data blocks includes a datetime stamp of a last update for said data blockc aim…

teaches a retransmission control method in a multicast service providing system in which an information delivery…

discloses a background process will maintain the environment see column…
XXXXXX
99

US20040030703A1

(Serge Bourbonnais, 2004)
(Original Assignee) International Business Machines Corp     

(Current Assignee)
International Business Machines Corp
Method, system, and program for merging log entries from multiple recovery log files reducing operations comprises instructions

data groups, output data groups several data

35 U.S.C. 103(a)

35 U.S.C. 102(b)
discloses the files placed in the recycle bin are compressed…

teaches wherein the sorted list is generated using a sorting key that includes a page and a start slot wherein the…

teaches using messages to communicate by messages with the target location that contain the timestamp or generated on…

teaches identifying committed pages and rows from each list…
XXXXXXXXXXXX
100

JP2004038313A

(Makoto Mihara, 2004)
(Original Assignee) Canon Inc; キヤノン株式会社     ログ取得方法およびプログラム、記憶媒体 processing data コンピュ

data group, first data group えること

second intermediate data ファイル

XXXXXXXXXXXXXXXXXXXXXXXX
101

US20040003086A1

(Jeffrey Parham, 2004)
(Original Assignee) Microsoft Corp     

(Current Assignee)
Microsoft Technology Licensing LLC
Re-partitioning directories first data, first data set communicatively couple

first data group user service

35 U.S.C. 103(a)

35 U.S.C. 102(e)

35 U.S.C. 102(b)
teaches that the upstream proxy server side proxy has a cache…

teaches using metrics about workload received at the resource from at least one client to derive monitoring feedback…

teaches using conditional getting instructions to a prefetch access and utilizing latest data and time of updating as…

discloses all the claimed subject matter as set forth above however…
XXXXXXXXXX
102

US20030217033A1

(Zigmund Sandler, 2003)
(Original Assignee) Aleri Inc     

(Current Assignee)
Sybase Inc
Database system and methods partitioning step second resource

different intermediate data, different key second minimum

reducing operations receiving step

second data second data

first data first data

XXXXXXX
103

US20030212664A1

(Martin Breining, 2003)
(Original Assignee) International Business Machines Corp     

(Current Assignee)
International Business Machines Corp
Querying markup language data sources using a relational query processor different schema Extensible Markup Language

mapping functions data definition

first data second program, first program

first set first type

35 U.S.C. 103(a)

35 U.S.C. 102(e)

35 U.S.C. 102(b)
discloses i ndices evolve at least in part by providing subsequent users with summary comparison usage information based…

teaches that this present invention relates to query processing and more specifically relates to techniques for…

discloses the method the system and the computer readable medium according to claim…

discloses enforce the following rules when sending data to a receiver that self announces as a strictly aligned receiver…
XXXXXX
104

EP1227396A1

(Donald J. Kadyk, 2002)
(Original Assignee) Microsoft Corp     

(Current Assignee)
Microsoft Corp
A method, system and computer program product for synchronizing data represented by different data structures by using update notifications second data second data

first data first data

35 U.S.C. 103(a)

35 U.S.C. 102(e)

35 U.S.C. 102(b)
teaches associating a device with multiple clients note that…

discloses a method for resuming an interruption of a previous incomplete synchronization session related to an exchange…

discloses the invention substantially as described in claims…

teaches such a feature user profile database stores user profiles that define a user device that can be one of many…
XXXX
105

JP2003157249A

(Koji Ito, 2003)
(Original Assignee) Degital Works Kk; ディジタル・ワークス株式会社     文書の圧縮格納方法 reducing operations 前記ノード

second intermediate data ファイル

computer system 行うこと

different schema メモリ

XXXXXXXXXXXXXXXXXXXXXXXXXXXXX
106

US20020184401A1

(Richard Kadel, 2002)
(Original Assignee) Polexis Inc     

(Current Assignee)
Polexis Inc
Extensible information system reducing operations readable instructions

second set, second data set selected attribute

includes data includes data

data partitions desired form

different schema XML schema

value pairs data items

35 U.S.C. 103(a)

35 U.S.C. 102(b)
discloses a computerimplemented method comprising receiving a query…

describes properties and services for each business attribute that is bound to the input field the framework comprising…

teaches wherein the operating system component comprises a resource manager component col…

discloses identifying metadata based on the query and analyzing associated metadata for each image in the first…
XXXXXXXXXX
107

US20030074348A1

(Paul Sinclair, 2003)
(Original Assignee) NCR Corp     

(Current Assignee)
Teradata US Inc
Partitioned database system first set, second set second functions

value pairs second values

computing devices first portion

output data set n value

35 U.S.C. 103(a)

35 U.S.C. 102(b)

35 U.S.C. 102(e)
discloses all of the elements of the claimed invention but fails to explicitly disclose the standard processing unit and…

discloses a method of deleting object data from a relational database as discussed in claim…

discloses a method and apparatus for facilitating and controlling a buyer driven market where prospective buyers of…

discloses receiving a request for withdrawal of an offer and retracting the offer via deleting the offer in question…
XXXXXX
108

US6553371B2

(Humberto Gutierrez-Rivas, 2003)
(Original Assignee) International Business Machines Corp     

(Current Assignee)
International Business Machines Corp
Method and system for specifying and displaying table joins in relational database queries second set, reduce method following steps

computing devices display device

particular reducer down list

XXXXXXX
109

US20030078958A1

(Charles Pace, 2003)
(Original Assignee) Op40 Inc     

(Current Assignee)
Op40 Inc
Method and system for deploying an asset over a multi-tiered network second data second data structure

computer system computer system

first data first data

35 U.S.C. 103(a)

35 U.S.C. 102(b)

35 U.S.C. 102(e)
discloses all the claimed subject matter as set forth above in claim…

teaches a system for accepting bids from advertisers to serve ads a search result is an ad to a searcher in a search…

teaches generating a rank value for each keyword the higher the rank value the top position the keyword will list and…

teaches hen the user double clicks on the payment image in the…
XXXXXXXXXXXXXXXXXXXXXXXXX
110

JP2003006021A

(Nobuo Kawamura, 2003)
(Original Assignee) Hitachi Ltd; 株式会社日立製作所     データベースシステムとデータベース管理方法およびプログラム corresponding different intermediate data のプログラム

processing data コンピュ

35 U.S.C. 103(a)

35 U.S.C. 102(e)
teaches receiving a new data record and a key that is associated with the new data record col…

discloses receiving a start offset of data to be accessed computing a cluster index by dividing said start offset by a…

teaches logic that makes a determination whether an old cluster file that is currently being written into can handle…

discloses temporarily storing part of the plurality of cluster col…
XXXXX
111

US7086085B1

(Bruce E Brown, 2006)
(Original Assignee) iLumin Corp     

(Current Assignee)
iLumin Corp
Variable trust levels for authentication different key performing authentication

different schema receiving input

35 U.S.C. 103(a)

35 U.S.C. 102(b)
teaches the invention substantially as claimed and described in claim…

discloses the deployment of the application in a sandbox to protect it and other applications…

discloses receiving an access request related to first data of the file system data at the broker service from the…

discloses a chat network maintained by a system of one or more computers for transmitting text audio audiovisual and…
XXXXXX
112

US6816854B2

(David Reiner, 2004)
(Original Assignee) Sun Microsystems Inc     

(Current Assignee)
Sun Microsystems Inc
Method and apparatus for database query decomposition particular data database queries

computer system computer system

XXXXXXXXXXXXXXXXXXXXXXXXX
113

US6768986B2

(Jean-Yves Cras, 2004)
(Original Assignee) SAP France SA     

(Current Assignee)
Business Objects Software Ltd
Mapping of an RDBMS schema onto a multidimensional data model computer system greatest number

first data, first data group repeating step

different schema data model

second schema leaf level

XXXXXXXXXXXXXXXXXXXXXXXXXX
114

EP1207464A2

(Yang-Lim Choi, 2002)
(Original Assignee) Samsung Electronics Co Ltd; University of California     

(Current Assignee)
Samsung Electronics Co Ltd ; University of California
Database indexing using a tree structure computer system preceding step

second schema data elements

first data said sub

second data group steps a

output data set n value

XXXXXXXXXXXXXXXXXXXXXXXXXX
115

US20020091677A1

(Mandayam Sridhar, 2002)
(Original Assignee) AMPERSAND Corp     

(Current Assignee)
AMPERSAND Corp
Content dereferencing in website development second set specific attribute

different schema data model

XXXXXX
116

JP2002197099A

(Koji Ito, 2002)
(Original Assignee) Degital Works Kk; ディジタル・ワークス株式会社     データベースの処理方法 particular data 読み出す段階

data group, first data group えること

different key のキー

XXXXXXXXXXXXXXXX
117

US20010051881A1

(Aaron Filler, 2001)
(Original Assignee) NEUROGRAFIX     

(Current Assignee)
NEUROGRAFIX
System, method and article of manufacture for managing a medical services network computer system data capture

reduce method said camera

35 U.S.C. 103(a)

35 U.S.C. 102(b)

35 U.S.C. 102(e)
teaches the system wherein the computer is further programmed to…

teaches to an integrated point of care system that includes an identification device for identifying the patient…

discloses a plurality of billing discounts each of the billing discounts associated with one of a plurality of customer…

teaches receiving consent from the user to send the health record to the destination address…
XXXXXXXXXXXXXXXXXXXXXXXX
118

EP1130872A1

(Aleksandr Stolyar, 2001)
(Original Assignee) Nokia of America Corp     

(Current Assignee)
Nokia of America Corp
Method of packet scheduling, with improved delay performance, for wireless networks output data set n value

first data, first data group time t

35 U.S.C. 103(a)

35 U.S.C. 102(b)
discloses that some users will desire to receive transmissions for a specified delay and some users will desire to…

teaches a first set of codes a second set of codes a third set of codes a fourth set of codes and fifth set of codes…

teaches for causing the computer to calculate remaining resources based on the announcement…

discloses receiving plurality of the data packets determining a packet age value for each received packet see col…
XXXXXX
119

US6609123B1

(Henk Cazemier, 2003)
(Original Assignee) Cognos Inc     

(Current Assignee)
International Business Machines Corp
Query engine and method for querying data using metadata model different schema individual component

different intermediate data reference object

35 U.S.C. 103(a)

35 U.S.C. 102(b)

35 U.S.C. 102(e)
teaches forming inode chunk having a bitmap indicating which inodes are actually in use wherein the reading of unused…

teaches wherein the sequential scan further includes performing a read pattern sequentially on a data storage device…

discloses generating a query based at least in part on a topic of interest generating a query col…

teaches a relational database contains attributes corresponding to search request and relational database is…
XXXXXX
120

EP1077413A2

(Junichi c/o Sony Computer Science Lab. Rekimoto, 2001)
(Original Assignee) Sony Corp     

(Current Assignee)
Sony Corp
Data access history indicating method and apparatus includes data recording means

second data, second data group said time

XXXXXXXX
121

JP2002024192A

(Yoshiko Tamaoki, 2002)
(Original Assignee) Hitachi Ltd; 株式会社日立製作所     計算機資源分割装置および資源分割方法 reduce method 少なくとも

second intermediate data ファイル

35 U.S.C. 103(a)

35 U.S.C. 102(e)

35 U.S.C. 102(b)
teaches an apparatus comprising a processor and a memory storing computer readable instructions execution of the…

discloses that such a feature enables carriers to provider more customized SLAs to users and thereby improve service…

teaches of wireless networking with dynamic load sharing and balancing specifically wherein providing the data to a…

teaches the digital signature being received via a local area network and the data is received via a storage area…
XXXXXXXX
122

US6721749B1

(Tarek Najm, 2004)
(Original Assignee) Microsoft Corp     

(Current Assignee)
Microsoft Technology Licensing LLC
Populating a data warehouse using a pipeline approach different lists data processing program

different intermediate data further process

particular reducer more log

35 U.S.C. 103(a)

35 U.S.C. 102(e)

35 U.S.C. 102(b)
teaches the invention substantially as claimed as noted above…

discloses applying a hash function to a concatenation of said block identifier and an identifier of a respective one of…

discloses tag information that indicates coordinate information about print data indicating a page orientation…

discloses all aspects of the claimed invention except communication system supports the…
XXXXXX
123

US6381611B1

(James Roberge, 2002)
(Original Assignee) Cyberpulse LLC     

(Current Assignee)
Ascend Hit LLC
Method and system for navigation and data entry in hierarchically-organized database views computer system computer system

first set, second set first one

35 U.S.C. 103(a)

35 U.S.C. 102(b)

35 U.S.C. 102(e)
teaches a third boundary image is displayed at the same position as a rst boundary image see…

discloses indexing media content on the internet comprising a mediax file containing a hierarchy of metadata however…

teaches a method and corresponding system and program product for providing a compact interface for display of an…

teaches a method for creating a highly connected network of nodes indicative of computerreadable data…
XXXXXXXXXXXXXXXXXXXXXXXX
124

JP2001331332A

(Fumio Kajiwara, 2001)
(Original Assignee) Nippon Telegr & Teleph Corp <Ntt>; 日本電信電話株式会社     アプリケーションシステムのリソース予約方法、予約装置、リソース量推定装置およびコンピュータシステム combine task 前記リソース

processing data コンピュ

computer system 行うこと

XXXXXXXXXXXXXXXXXXXXXXXXXXXX
125

JP2001298453A

(Hantai Takahashi, 2001)
(Original Assignee) Fuji Xerox Co Ltd; 富士ゼロックス株式会社     ネットワーク表示装置 processing data コンピュ

providing metadata 関係付け

different schema メモリ

XXXXXXXXXXX
126

US7103836B1

(Lee Evan Nakamura, 2006)
(Original Assignee) International Business Machines Corp     

(Current Assignee)
International Business Machines Corp
Method and system for generating materials for presentation on a non-frame capable web browser computing devices first portion

second set said server

XXX
127

US6505187B1

(Ambuj Shatdal, 2003)
(Original Assignee) NCR Corp     

(Current Assignee)
Teradata US Inc
Computing multiple order-based functions in a parallel processing database system data partitions different partition

first set, second set first one

35 U.S.C. 103(a)

35 U.S.C. 102(b)

35 U.S.C. 102(e)
teaches all of the claimed subject matter as discussed above with respect to claim…

teaches maintaining database includes drug companies insures medical facilities…

teaches information gathered by the practitioner is uploaded to the server…

teaches wherein upon going to the second level the traversal of the set of data elements includes determining if there…
XXXX
128

US6735593B1

(Simon Guy Williams, 2004)
(Original Assignee) ANSWERBRISK Ltd; LAZY SOFTWARE Ltd OF BIRCHES     

(Current Assignee)
SLIGO INNOVATIONS LLC
Systems and methods for storing data second set including one

first schema defines one

35 U.S.C. 103(a)

35 U.S.C. 102(e)

35 U.S.C. 102(b)
discloses generating global node rank information and generating keywordspecific node rank information using the global…

teaches that it is known to customize browser interfaces with…

discloses wherein the combining the multiple initial rankings comprises combining based on user definable adjusting…

teaches that this present invention relates to query processing and more specifically relates to techniques for…
XX
129

US6609131B1

(Mohamed Zait, 2003)
(Original Assignee) Oracle International Corp     

(Current Assignee)
Oracle International Corp
Parallel partition-wise joins data partitions, partitioning step first partition, second subsets

intermediate data set said first set

second set second set

35 U.S.C. 103(a)

35 U.S.C. 102(b)

35 U.S.C. 102(e)
discloses that deletion of a model can only be done by the owner who will have an owner identification…

teaches said model may be decommissioned only when dependencies are resolved…

teaches a tangible computerreadable medium according to claim…

teaches receiving a request to decommission a simulation model associated with said performance data wherein said…
XXXXXXXXXXX
130

US6446048B1

(Michael L. Wells, 2002)
(Original Assignee) Intuit Inc     

(Current Assignee)
Intuit Inc
Web-based entry of financial transaction information and subsequent download of such information data partition, first data group sending information

value pairs, value pair respective users

35 U.S.C. 103(a)

35 U.S.C. 102(e)
teaches completed questions that when analyzed see if claimant qualifies for long term disability benefits see…

teaches method and apparatus that processes financial data relating to wealth accumulation plans…

teaches survey questions relating to income sources and sever specific types of income but fails to specifically teach…

teaches method of evaluating a permanent life insurance policy…
XXXX
131

US6330653B1

(Golden E. Murray, 2001)
(Original Assignee) Powerquest Corp     

(Current Assignee)
Veritas Technologies LLC
Manipulation of virtual and live computer storage device partitions particular data, data partition simulation result, allocation table

computer system computer system

first intermediate data, first intermediate data set same computer

output data groups free space

first set first set

second data group then c

35 U.S.C. 103(a)

35 U.S.C. 102(b)
teaches disk imaging programs do not copy user data filebyfile but instead copy data clusterbycluster or…

teaches of a computer system comprising a central processing unit CPU fig…

discloses the invention substantially as claimed including a partition recovery method…

discloses a method to ensure that data is only accessed using the proper identi er…
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
132

JPH11327982A

(Mark Lawrence Blood, 1999)
(Original Assignee) Lucent Technol Inc; ルーセント テクノロジーズ インコーポレイテッド     分散デ―タベ―スシステム障害回復方法 second set, first set 有する第1, アップ

processing data コンピュ

reduce method システム

different schema メモリ

35 U.S.C. 103(a)

35 U.S.C. 102(b)

35 U.S.C. 102(e)
discloses a distributed system using a quorum redundancy method in which a redundancy process is executed by at least Q…

discloses reporting directory to the second web site to update data periodically col…

teaches the stored data includes items of the database comprises objects in an object database col…

teaches wherein the first bulk delete timestamp and the second bulk delete timestamp comprise information describing…
XXXXXXXXXXXXXX
133

US6101495A

(Masashi Tsuchida, 2000)
(Original Assignee) Hitachi Ltd     

(Current Assignee)
Hitachi Ltd
Method of executing partition operations in a parallel database system first data, data partition new partitions

first data group specific value

second set, value pairs current data

35 U.S.C. 103(a)

35 U.S.C. 102(b)
discloses wherein rows of the base table are stored in table partitions and wherein there is one index partition for…

teaches where dividing includes identifying one of a rst group of words see col…

teaches or suggests reading data from a source converting the extracted data into the form the data needs to be in and…

discloses an intelligent distributed file system enables the storing of file data among a plurality of smart storage…
XXXX
134

US6405198B1

(Roger Georges Bitar, 2002)
(Original Assignee) International Business Machines Corp     

(Current Assignee)
International Business Machines Corp
Complex data query support in a partitioned database system second set including one

different schema data object

35 U.S.C. 103(a)

35 U.S.C. 102(b)

35 U.S.C. 102(e)
teaches allowing a user to search music based on music title…

discloses the limitations substantially similar to those in claim…

discloses the method the system and the computer readable medium according to claim…

teaches the parameters of the scoring function are learned by a machine learning algorithm…
XXXXXX
135

US6278989B1

(Surajit Chaudhuri, 2001)
(Original Assignee) Microsoft Corp     

(Current Assignee)
Microsoft Technology Licensing LLC
Histogram construction using adaptive random sampling with cross-validation for database systems partitioning step partitioning step

first data, first data group repeating step

processing data, computer system square root

35 U.S.C. 103(a)

35 U.S.C. 102(e)

35 U.S.C. 102(b)
teaches the claimed invention as described above see claim…

discloses i ndices evolve at least in part by providing subsequent users with summary comparison usage information based…

discloses receiving search term form user matching term with search chronicle and providing result of the search…

discloses a search engine system and method in which displaying the subset in a first display area and displaying the…
XXXXXXXXXXXXXXXXXXXXXXXXXXX
136

US6353818B1

(Felipe Carino, 2002)
(Original Assignee) NCR Corp     

(Current Assignee)
Teradata US Inc
Plan-per-tuple optimizing of database queries with user-defined functions first data, first data set communicatively couple, first data

first data group non-volatile memory

different schema compile time

second data second data

35 U.S.C. 103(a)

35 U.S.C. 102(b)

35 U.S.C. 102(e)
teaches a system for accepting bids from advertisers to serve ads a search result is an ad to a searcher in a search…

discloses wherein the data model includes a cohorts analysis metric operable to provide a flexible mechanism for…

teaches wherein the data page comprises clustered index leaf pages…

teaches generating a rank value for each keyword the higher the rank value the top position the keyword will list and…
XXXXXXXXXXXX
137

US6223182B1

(Nipun Agarwal, 2001)
(Original Assignee) Oracle Corp     

(Current Assignee)
Oracle International Corp
Dynamic data organization particular data group more process

computing devices first column

processing data, computer system square root

first data first data

XXXXXXXXXXXXXXXXXXXXXXXXXXX
138

US6167405A

(Kenneth R. Rosensteel, 2000)
(Original Assignee) Bull HN Information Systems Inc     

(Current Assignee)
Bull HN Information Systems Inc
Method and apparatus for automatically populating a data warehouse system data partition, data group information representative, first control

mapping functions corresponding port

data partitions data replication

particular data group source entities

first set first type

XXXXXXXXXXXXXXXX
139

US6509898B2

(Ed H. Chi, 2003)
(Original Assignee) Xerox Corp     

(Current Assignee)
Google LLC
Usage based methods of traversing and displaying generalized graph structures computing devices display device

mapping functions depth h

XXXX
140

US5943663A

(Gary C. Mouradian, 1999)
(Original Assignee) Mouradian; Gary C.     Data processing method and system utilizing parallel processing particular reducer representing data

reduce method potential object

data groups, computer system new pressure

second data said memory

XXXXXXXXXXXXXXXXXXXXXXXXXXXXX
141

US6321374B1

(David Mun-Hien Choy, 2001)
(Original Assignee) International Business Machines Corp     

(Current Assignee)
International Business Machines Corp
Application-independent generator to generate a database transaction manager in heterogeneous information systems groups having different schema associated parameters

first set, data set input file

data partitions one file

XXXXXXXXXX




US8190610B2

Filed: 2006-10-05     Issued: 2012-05-29

MapReduce for distributed database processing

(Original Assignee) Yahoo Inc     (Current Assignee) R2 Solutions LLC ; Altaba Inc

Ali Dasdan, Hung-Chih Yang, Ruey-Lung Hsiao
USENIX Association Proceedings Of The Sixth Symposium On Operating Systems Design And Implementation (OSDE 04). : 137-149 2004

Publication Year: 2004

MapReduce: Simplified Data Processing On Large Clusters

No Affiliation

Dean, Ghemawat, Usenix
US8190610B2
CLAIM 1
. A method of processing data of a data set (data sets) over a distributed system , wherein the data set comprises a plurality of data groups , the method comprising : partitioning the data of each one of the data groups into a plurality of data partitions that each have a plurality of key-value pairs and providing each data partition to a selected one of a plurality of mapping functions that are each user-configurable to independently output a plurality of lists of values for each of a set of keys found in such map function' ;

s corresponding data partition to form corresponding intermediate data for that data group and identifiable to that data group , wherein the data of a first data group has a different schema than the data of a second data group and the data of the first data group is mapped differently than the data of the second data group so that different lists of values are output for the corresponding different intermediate data , wherein the different schema and corresponding different intermediate data have a key in common ;

and reducing the intermediate data for the data groups to at least one output data group , including processing the intermediate data for each data group in a manner that is defined to correspond to that data group , so as to result in a merging of the corresponding different intermediate data based on the key in common , wherein the mapping and reducing operations are performed by a distributed system .
MapReduce : Simplified Data Processing On Large Clusters . MapReduce is a programming model and an associated implementation for processing and generating large data sets (first set, second set, data set, output data set) . Users specify a map function that processes a key/value pair to generate a set of intermediate key/value pairs , and a reduce function that merges all intermediate values associated with the same intermediate key . Many real world tasks are expressible in this model , as shown in the paper . Programs written in this functional style are automatically parallelized and executed on a large cluster of commodity machines . The run-time system takes care of the details of partitioning the input data , scheduling the program's execution across a set of machines , handling machine failures , and managing the required inter-machine communication . This allows programmers without any experience with parallel and distributed systems to easily utilize the resources of a large distributed system . Our implementation of MapReduce runs on a large cluster of commodity machines and is highly scalable : a typical MapReduce computation processes many terabytes of data on thousands of machines . Programmers find the system easy to use : hundreds of MapReduce programs have been implemented and upwards of one thousand MapReduce jobs are executed on Google's clusters every day .

US8190610B2
CLAIM 17
. A computer system including a plurality of computing devices , the computer system configured to process data of a data set (data sets) , wherein the data set comprises a plurality of data groups , the computer system comprises at least one processor and memory that are operable to perform the following operations : partitioning the data of each one of the data groups into a plurality of data partitions that each have a plurality of key-value pairs and providing each data partition to a selected one of a plurality of mapping functions that are each user-configurable to independently output a plurality of lists of values for each of a set of keys found in such map function' ;

s corresponding data partition to form corresponding intermediate data for that data group and identifiable to that data group , wherein the data of a first data group has a different schema than the data of a second data group and the data of the first data group is mapped differently than the data of the second data group so that different lists of values are output for the corresponding different intermediate data , wherein the different schema and corresponding different intermediate data have a key in common ;

and reduce the intermediate data for the data groups to at least one output data group , including processing the intermediate data for each data group in a manner that is defined to correspond to that data group so as to result in a merging of the corresponding different intermediate data based on the key in common .
MapReduce : Simplified Data Processing On Large Clusters . MapReduce is a programming model and an associated implementation for processing and generating large data sets (first set, second set, data set, output data set) . Users specify a map function that processes a key/value pair to generate a set of intermediate key/value pairs , and a reduce function that merges all intermediate values associated with the same intermediate key . Many real world tasks are expressible in this model , as shown in the paper . Programs written in this functional style are automatically parallelized and executed on a large cluster of commodity machines . The run-time system takes care of the details of partitioning the input data , scheduling the program's execution across a set of machines , handling machine failures , and managing the required inter-machine communication . This allows programmers without any experience with parallel and distributed systems to easily utilize the resources of a large distributed system . Our implementation of MapReduce runs on a large cluster of commodity machines and is highly scalable : a typical MapReduce computation processes many terabytes of data on thousands of machines . Programmers find the system easy to use : hundreds of MapReduce programs have been implemented and upwards of one thousand MapReduce jobs are executed on Google's clusters every day .

US8190610B2
CLAIM 33
. A map-reduce method of processing data from a plurality of groups having different schema over a computer system , the method comprising : for a first data set (data sets) having a plurality of first key-value pairs , wherein such first data set belongs to a first data group and the first key-value pairs have a first schema , partitioning the first data set into a plurality of data partitions and providing each data partition to a selected one of a plurality of mapping functions that are each user-configurable to independently output a plurality of lists of values for each of a set of keys found in such map function' ;

s corresponding data partition to form a first intermediate data set having a first set (data sets) of resulting key-value pairs ;

for a second data set having a plurality of second key-value pairs , wherein such second data set belongs to a second data group and the second key-value pairs have a second schema , partitioning the first data set into a plurality of data partitions and providing each data partition to a selected one of a plurality of mapping functions to form a second intermediate data set having a second set (data sets) of resulting key-value pairs differing from the first set of resulting key-value pairs , wherein the first schema differs from the second schema , wherein the first and second schema and the first and second set of resulting key-value pairs have a key in common ;

and reducing the first and the second intermediate data set together so as to form an output data set , wherein the reducing is accomplished by iterating on at least one key from the first and second intermediate data set , and the output data set has a different schema than the first and second schema , wherein the mapping and reducing operations are performed by a distributed system .
MapReduce : Simplified Data Processing On Large Clusters . MapReduce is a programming model and an associated implementation for processing and generating large data sets (first set, second set, data set, output data set) . Users specify a map function that processes a key/value pair to generate a set of intermediate key/value pairs , and a reduce function that merges all intermediate values associated with the same intermediate key . Many real world tasks are expressible in this model , as shown in the paper . Programs written in this functional style are automatically parallelized and executed on a large cluster of commodity machines . The run-time system takes care of the details of partitioning the input data , scheduling the program's execution across a set of machines , handling machine failures , and managing the required inter-machine communication . This allows programmers without any experience with parallel and distributed systems to easily utilize the resources of a large distributed system . Our implementation of MapReduce runs on a large cluster of commodity machines and is highly scalable : a typical MapReduce computation processes many terabytes of data on thousands of machines . Programmers find the system easy to use : hundreds of MapReduce programs have been implemented and upwards of one thousand MapReduce jobs are executed on Google's clusters every day .

US8190610B2
CLAIM 34
. The map-reduce method of claim 33 , wherein the reducing is accomplished by iterating on at least one key from each of the first and second intermediate data set (data sets) so that the output data set is a merging of a portion of the first and second intermediate data set .
MapReduce : Simplified Data Processing On Large Clusters . MapReduce is a programming model and an associated implementation for processing and generating large data sets (first set, second set, data set, output data set) . Users specify a map function that processes a key/value pair to generate a set of intermediate key/value pairs , and a reduce function that merges all intermediate values associated with the same intermediate key . Many real world tasks are expressible in this model , as shown in the paper . Programs written in this functional style are automatically parallelized and executed on a large cluster of commodity machines . The run-time system takes care of the details of partitioning the input data , scheduling the program's execution across a set of machines , handling machine failures , and managing the required inter-machine communication . This allows programmers without any experience with parallel and distributed systems to easily utilize the resources of a large distributed system . Our implementation of MapReduce runs on a large cluster of commodity machines and is highly scalable : a typical MapReduce computation processes many terabytes of data on thousands of machines . Programmers find the system easy to use : hundreds of MapReduce programs have been implemented and upwards of one thousand MapReduce jobs are executed on Google's clusters every day .

US8190610B2
CLAIM 35
. The map-reduce method of claim 33 , wherein the reducing is accomplished by a plurality of reducers , the method further comprises : partitioning the first or second intermediate data set (data sets) into a plurality of partitions , at least some of the key-value pairs of the first or second intermediate data being provided to more than one of the partitions ;

and providing the partitioned intermediate data set of each partition to a separate one of the reducers .
MapReduce : Simplified Data Processing On Large Clusters . MapReduce is a programming model and an associated implementation for processing and generating large data sets (first set, second set, data set, output data set) . Users specify a map function that processes a key/value pair to generate a set of intermediate key/value pairs , and a reduce function that merges all intermediate values associated with the same intermediate key . Many real world tasks are expressible in this model , as shown in the paper . Programs written in this functional style are automatically parallelized and executed on a large cluster of commodity machines . The run-time system takes care of the details of partitioning the input data , scheduling the program's execution across a set of machines , handling machine failures , and managing the required inter-machine communication . This allows programmers without any experience with parallel and distributed systems to easily utilize the resources of a large distributed system . Our implementation of MapReduce runs on a large cluster of commodity machines and is highly scalable : a typical MapReduce computation processes many terabytes of data on thousands of machines . Programmers find the system easy to use : hundreds of MapReduce programs have been implemented and upwards of one thousand MapReduce jobs are executed on Google's clusters every day .

US8190610B2
CLAIM 36
. The map-reduce method of claim 35 , wherein all of the key-value pairs of the first and second intermediate data set (data sets) are provided to all of the reducers .
MapReduce : Simplified Data Processing On Large Clusters . MapReduce is a programming model and an associated implementation for processing and generating large data sets (first set, second set, data set, output data set) . Users specify a map function that processes a key/value pair to generate a set of intermediate key/value pairs , and a reduce function that merges all intermediate values associated with the same intermediate key . Many real world tasks are expressible in this model , as shown in the paper . Programs written in this functional style are automatically parallelized and executed on a large cluster of commodity machines . The run-time system takes care of the details of partitioning the input data , scheduling the program's execution across a set of machines , handling machine failures , and managing the required inter-machine communication . This allows programmers without any experience with parallel and distributed systems to easily utilize the resources of a large distributed system . Our implementation of MapReduce runs on a large cluster of commodity machines and is highly scalable : a typical MapReduce computation processes many terabytes of data on thousands of machines . Programmers find the system easy to use : hundreds of MapReduce programs have been implemented and upwards of one thousand MapReduce jobs are executed on Google's clusters every day .

US8190610B2
CLAIM 40
. A computer system including a plurality of computing devices , the computer system configured to process data of a data set (data sets) , wherein the data set comprises a plurality of data groups , the computer system comprises at least one processor and memory that are operable to perform the following operations : for a first data set having a plurality of first key-value pairs , wherein such first data set belongs to a first data group and the first key-value pairs have a first schema , partitioning the first data set into a plurality of data partitions and providing each data partition to a selected one of a plurality of mapping functions that are each user-configurable to independently output a plurality of lists of values for each of a set of keys found in such map function' ;

s corresponding data partition to form a first intermediate data set having a first set (data sets) of resulting key-value pairs ;

for a second data set having a plurality of second key-value pairs , wherein such second data set belongs to a second data group and the second key-value pairs have a second schema , partitioning the first data set into a plurality of data partitions and providing each data partition to a selected one of a plurality of mapping functions to form a second intermediate data set having a second set (data sets) of resulting key-value pairs differing from the first set of resulting key-value pairs , wherein the first schema differs from the second schema ;

and reducing the first and the second intermediate data set together so as to form an output data set , wherein the reducing is accomplished by iterating on at least one key from the first and second intermediate data set , and the output data set has a different schema than the first and second schema , wherein the first and second schema and the first and second set of resulting key-value pairs have a key in common .
MapReduce : Simplified Data Processing On Large Clusters . MapReduce is a programming model and an associated implementation for processing and generating large data sets (first set, second set, data set, output data set) . Users specify a map function that processes a key/value pair to generate a set of intermediate key/value pairs , and a reduce function that merges all intermediate values associated with the same intermediate key . Many real world tasks are expressible in this model , as shown in the paper . Programs written in this functional style are automatically parallelized and executed on a large cluster of commodity machines . The run-time system takes care of the details of partitioning the input data , scheduling the program's execution across a set of machines , handling machine failures , and managing the required inter-machine communication . This allows programmers without any experience with parallel and distributed systems to easily utilize the resources of a large distributed system . Our implementation of MapReduce runs on a large cluster of commodity machines and is highly scalable : a typical MapReduce computation processes many terabytes of data on thousands of machines . Programmers find the system easy to use : hundreds of MapReduce programs have been implemented and upwards of one thousand MapReduce jobs are executed on Google's clusters every day .

US8190610B2
CLAIM 41
. The computer system of claim 40 , wherein the reducing is accomplished by iterating on at least one key from each of the first and second intermediate data set (data sets) so that the output data set is a merging of a portion of the first and second intermediate data set .
MapReduce : Simplified Data Processing On Large Clusters . MapReduce is a programming model and an associated implementation for processing and generating large data sets (first set, second set, data set, output data set) . Users specify a map function that processes a key/value pair to generate a set of intermediate key/value pairs , and a reduce function that merges all intermediate values associated with the same intermediate key . Many real world tasks are expressible in this model , as shown in the paper . Programs written in this functional style are automatically parallelized and executed on a large cluster of commodity machines . The run-time system takes care of the details of partitioning the input data , scheduling the program's execution across a set of machines , handling machine failures , and managing the required inter-machine communication . This allows programmers without any experience with parallel and distributed systems to easily utilize the resources of a large distributed system . Our implementation of MapReduce runs on a large cluster of commodity machines and is highly scalable : a typical MapReduce computation processes many terabytes of data on thousands of machines . Programmers find the system easy to use : hundreds of MapReduce programs have been implemented and upwards of one thousand MapReduce jobs are executed on Google's clusters every day .

US8190610B2
CLAIM 42
. The computer system of claim 40 , wherein the reducing is accomplished by a plurality of reducers , the d the at least one processor and memory further operable for : partitioning the first or second intermediate data set (data sets) into a plurality of partitions , at least some of the key-value pairs of the first or second intermediate data being provided to more than one of the partitions ;

and providing the partitioned intermediate data set of each partition to a separate one of the reducers .
MapReduce : Simplified Data Processing On Large Clusters . MapReduce is a programming model and an associated implementation for processing and generating large data sets (first set, second set, data set, output data set) . Users specify a map function that processes a key/value pair to generate a set of intermediate key/value pairs , and a reduce function that merges all intermediate values associated with the same intermediate key . Many real world tasks are expressible in this model , as shown in the paper . Programs written in this functional style are automatically parallelized and executed on a large cluster of commodity machines . The run-time system takes care of the details of partitioning the input data , scheduling the program's execution across a set of machines , handling machine failures , and managing the required inter-machine communication . This allows programmers without any experience with parallel and distributed systems to easily utilize the resources of a large distributed system . Our implementation of MapReduce runs on a large cluster of commodity machines and is highly scalable : a typical MapReduce computation processes many terabytes of data on thousands of machines . Programmers find the system easy to use : hundreds of MapReduce programs have been implemented and upwards of one thousand MapReduce jobs are executed on Google's clusters every day .

US8190610B2
CLAIM 43
. The computer system of claim 42 , wherein all of the key-value pairs of the first and second intermediate data set (data sets) are provided to all of the reducers .
MapReduce : Simplified Data Processing On Large Clusters . MapReduce is a programming model and an associated implementation for processing and generating large data sets (first set, second set, data set, output data set) . Users specify a map function that processes a key/value pair to generate a set of intermediate key/value pairs , and a reduce function that merges all intermediate values associated with the same intermediate key . Many real world tasks are expressible in this model , as shown in the paper . Programs written in this functional style are automatically parallelized and executed on a large cluster of commodity machines . The run-time system takes care of the details of partitioning the input data , scheduling the program's execution across a set of machines , handling machine failures , and managing the required inter-machine communication . This allows programmers without any experience with parallel and distributed systems to easily utilize the resources of a large distributed system . Our implementation of MapReduce runs on a large cluster of commodity machines and is highly scalable : a typical MapReduce computation processes many terabytes of data on thousands of machines . Programmers find the system easy to use : hundreds of MapReduce programs have been implemented and upwards of one thousand MapReduce jobs are executed on Google's clusters every day .




US8190610B2

Filed: 2006-10-05     Issued: 2012-05-29

MapReduce for distributed database processing

(Original Assignee) Yahoo Inc     (Current Assignee) R2 Solutions LLC ; Altaba Inc

Ali Dasdan, Hung-Chih Yang, Ruey-Lung Hsiao
CONCURRENCY-PRACTICE AND EXPERIENCE. 9 (9): 897-914 SEP 1997

Publication Year: 1997

A New Memory Allocation Method For Shared Memory Multiprocessors With Large Virtual Address Space

The Japan Atomic Energy Research Institute (JAERI), The University of Electro-Communications (電気通信大学, Denki-Tsūshin Daigaku)

Koide, Suzuki, Nakayama
US8190610B2
CLAIM 1
. A method of processing data of a data set over a distributed system , wherein the data set comprises a plurality of data groups , the method comprising : partitioning the data of each one of the data groups into a plurality of data partitions that each have a plurality of key-value pairs and providing each data partition to a selected one of a plurality of mapping functions that are each user-configurable to independently output a plurality of lists of values for each of a set of keys found in such map function' ;

s corresponding data partition to form corresponding intermediate data for that data group and identifiable to that data group , wherein the data of a first data (time t) group has a different schema than the data of a second data group and the data of the first data group is mapped differently than the data of the second data group so that different lists of values are output for the corresponding different intermediate data , wherein the different schema and corresponding different intermediate data have a key in common ;

and reducing the intermediate data for the data groups to at least one output data group , including processing the intermediate data for each data group in a manner that is defined to correspond to that data group , so as to result in a merging of the corresponding different intermediate data based on the key in common , wherein the mapping and reducing operations are performed by a distributed system .
A New Memory Allocation Method For Shared Memory Multiprocessors With Large Virtual Address Space . This paper proposes a new memory allocation method for shared memory multiprocessors with large virtual address spaces , An evaluation of its performance is also presented . For effective use of shared memory multiprocessors , it is important that no processor's execution is blocked , if several processors simultaneously access a shared variable , their processes are blocked and access to the variable is serialized , Thus , frequent access to shared variables reduces the parallelism , In particular , the parallelism is significantly reduced when a special shared variable - the 'allocation pointer' - is frequently accessed in the dynamic object allocation by an application program , In this paper , we propose a new method for allocating physical memory pages where the allocation pointer is monotonically increased in the virtual address space in contrast to the conventional method , This allows the critical sections for access to the allocation pointer to be executed effectively and atomically by using the fetch-and-add primitive , Our method improves the application program's parallelism by access to the allocation pointer with considerably short blocking time t (first data, first data group) o the process . (C)1997 by John Wiley & ;
Sons , Ltd .

US8190610B2
CLAIM 12
. The method of claim 5 , wherein : the intermediate data includes a plurality of grouped sets of key/value pairs ;

the reducing step is carried out by a plurality of reducers ;

for at least one of the reducers , the iterator corresponding to a particular data group , for that reducer , operates according to a different key (memory pages) of a different schema than the iterator corresponding to another particular data group , for that reducer .
A New Memory Allocation Method For Shared Memory Multiprocessors With Large Virtual Address Space . This paper proposes a new memory allocation method for shared memory multiprocessors with large virtual address spaces , An evaluation of its performance is also presented . For effective use of shared memory multiprocessors , it is important that no processor's execution is blocked , if several processors simultaneously access a shared variable , their processes are blocked and access to the variable is serialized , Thus , frequent access to shared variables reduces the parallelism , In particular , the parallelism is significantly reduced when a special shared variable - the 'allocation pointer' - is frequently accessed in the dynamic object allocation by an application program , In this paper , we propose a new method for allocating physical memory pages (different key) where the allocation pointer is monotonically increased in the virtual address space in contrast to the conventional method , This allows the critical sections for access to the allocation pointer to be executed effectively and atomically by using the fetch-and-add primitive , Our method improves the application program's parallelism by access to the allocation pointer with considerably short blocking time to the process . (C)1997 by John Wiley & ;
Sons , Ltd .

US8190610B2
CLAIM 17
. A computer system including a plurality of computing devices , the computer system configured to process data of a data set , wherein the data set comprises a plurality of data groups , the computer system comprises at least one processor and memory that are operable to perform the following operations : partitioning the data of each one of the data groups into a plurality of data partitions that each have a plurality of key-value pairs and providing each data partition to a selected one of a plurality of mapping functions that are each user-configurable to independently output a plurality of lists of values for each of a set of keys found in such map function' ;

s corresponding data partition to form corresponding intermediate data for that data group and identifiable to that data group , wherein the data of a first data (time t) group has a different schema than the data of a second data group and the data of the first data group is mapped differently than the data of the second data group so that different lists of values are output for the corresponding different intermediate data , wherein the different schema and corresponding different intermediate data have a key in common ;

and reduce the intermediate data for the data groups to at least one output data group , including processing the intermediate data for each data group in a manner that is defined to correspond to that data group so as to result in a merging of the corresponding different intermediate data based on the key in common .
A New Memory Allocation Method For Shared Memory Multiprocessors With Large Virtual Address Space . This paper proposes a new memory allocation method for shared memory multiprocessors with large virtual address spaces , An evaluation of its performance is also presented . For effective use of shared memory multiprocessors , it is important that no processor's execution is blocked , if several processors simultaneously access a shared variable , their processes are blocked and access to the variable is serialized , Thus , frequent access to shared variables reduces the parallelism , In particular , the parallelism is significantly reduced when a special shared variable - the 'allocation pointer' - is frequently accessed in the dynamic object allocation by an application program , In this paper , we propose a new method for allocating physical memory pages where the allocation pointer is monotonically increased in the virtual address space in contrast to the conventional method , This allows the critical sections for access to the allocation pointer to be executed effectively and atomically by using the fetch-and-add primitive , Our method improves the application program's parallelism by access to the allocation pointer with considerably short blocking time t (first data, first data group) o the process . (C)1997 by John Wiley & ;
Sons , Ltd .

US8190610B2
CLAIM 22
. The computer system of claim 21 , wherein : the intermediate data includes a plurality of grouped sets of key/value pairs ;

the at least one processor and memory are further operable to reduce the intermediate data via a plurality of reducers ;

for at least one of the reducers , the iterator corresponding to a particular data group , for that reducer , is configured to operate according to a different key (memory pages) of a different schema than the iterator corresponding to another particular data group , for that reducer .
A New Memory Allocation Method For Shared Memory Multiprocessors With Large Virtual Address Space . This paper proposes a new memory allocation method for shared memory multiprocessors with large virtual address spaces , An evaluation of its performance is also presented . For effective use of shared memory multiprocessors , it is important that no processor's execution is blocked , if several processors simultaneously access a shared variable , their processes are blocked and access to the variable is serialized , Thus , frequent access to shared variables reduces the parallelism , In particular , the parallelism is significantly reduced when a special shared variable - the 'allocation pointer' - is frequently accessed in the dynamic object allocation by an application program , In this paper , we propose a new method for allocating physical memory pages (different key) where the allocation pointer is monotonically increased in the virtual address space in contrast to the conventional method , This allows the critical sections for access to the allocation pointer to be executed effectively and atomically by using the fetch-and-add primitive , Our method improves the application program's parallelism by access to the allocation pointer with considerably short blocking time to the process . (C)1997 by John Wiley & ;
Sons , Ltd .

US8190610B2
CLAIM 33
. A map-reduce method of processing data from a plurality of groups having different schema over a computer system , the method comprising : for a first data (time t) set having a plurality of first key-value pairs , wherein such first data set belongs to a first data group and the first key-value pairs have a first schema , partitioning the first data set into a plurality of data partitions and providing each data partition to a selected one of a plurality of mapping functions that are each user-configurable to independently output a plurality of lists of values for each of a set of keys found in such map function' ;

s corresponding data partition to form a first intermediate data set having a first set of resulting key-value pairs ;

for a second data set having a plurality of second key-value pairs , wherein such second data set belongs to a second data group and the second key-value pairs have a second schema , partitioning the first data set into a plurality of data partitions and providing each data partition to a selected one of a plurality of mapping functions to form a second intermediate data set having a second set of resulting key-value pairs differing from the first set of resulting key-value pairs , wherein the first schema differs from the second schema , wherein the first and second schema and the first and second set of resulting key-value pairs have a key in common ;

and reducing the first and the second intermediate data set together so as to form an output data set , wherein the reducing is accomplished by iterating on at least one key from the first and second intermediate data set , and the output data set has a different schema than the first and second schema , wherein the mapping and reducing operations are performed by a distributed system .
A New Memory Allocation Method For Shared Memory Multiprocessors With Large Virtual Address Space . This paper proposes a new memory allocation method for shared memory multiprocessors with large virtual address spaces , An evaluation of its performance is also presented . For effective use of shared memory multiprocessors , it is important that no processor's execution is blocked , if several processors simultaneously access a shared variable , their processes are blocked and access to the variable is serialized , Thus , frequent access to shared variables reduces the parallelism , In particular , the parallelism is significantly reduced when a special shared variable - the 'allocation pointer' - is frequently accessed in the dynamic object allocation by an application program , In this paper , we propose a new method for allocating physical memory pages where the allocation pointer is monotonically increased in the virtual address space in contrast to the conventional method , This allows the critical sections for access to the allocation pointer to be executed effectively and atomically by using the fetch-and-add primitive , Our method improves the application program's parallelism by access to the allocation pointer with considerably short blocking time t (first data, first data group) o the process . (C)1997 by John Wiley & ;
Sons , Ltd .

US8190610B2
CLAIM 40
. A computer system including a plurality of computing devices , the computer system configured to process data of a data set , wherein the data set comprises a plurality of data groups , the computer system comprises at least one processor and memory that are operable to perform the following operations : for a first data (time t) set having a plurality of first key-value pairs , wherein such first data set belongs to a first data group and the first key-value pairs have a first schema , partitioning the first data set into a plurality of data partitions and providing each data partition to a selected one of a plurality of mapping functions that are each user-configurable to independently output a plurality of lists of values for each of a set of keys found in such map function' ;

s corresponding data partition to form a first intermediate data set having a first set of resulting key-value pairs ;

for a second data set having a plurality of second key-value pairs , wherein such second data set belongs to a second data group and the second key-value pairs have a second schema , partitioning the first data set into a plurality of data partitions and providing each data partition to a selected one of a plurality of mapping functions to form a second intermediate data set having a second set of resulting key-value pairs differing from the first set of resulting key-value pairs , wherein the first schema differs from the second schema ;

and reducing the first and the second intermediate data set together so as to form an output data set , wherein the reducing is accomplished by iterating on at least one key from the first and second intermediate data set , and the output data set has a different schema than the first and second schema , wherein the first and second schema and the first and second set of resulting key-value pairs have a key in common .
A New Memory Allocation Method For Shared Memory Multiprocessors With Large Virtual Address Space . This paper proposes a new memory allocation method for shared memory multiprocessors with large virtual address spaces , An evaluation of its performance is also presented . For effective use of shared memory multiprocessors , it is important that no processor's execution is blocked , if several processors simultaneously access a shared variable , their processes are blocked and access to the variable is serialized , Thus , frequent access to shared variables reduces the parallelism , In particular , the parallelism is significantly reduced when a special shared variable - the 'allocation pointer' - is frequently accessed in the dynamic object allocation by an application program , In this paper , we propose a new method for allocating physical memory pages where the allocation pointer is monotonically increased in the virtual address space in contrast to the conventional method , This allows the critical sections for access to the allocation pointer to be executed effectively and atomically by using the fetch-and-add primitive , Our method improves the application program's parallelism by access to the allocation pointer with considerably short blocking time t (first data, first data group) o the process . (C)1997 by John Wiley & ;
Sons , Ltd .




US8190610B2

Filed: 2006-10-05     Issued: 2012-05-29

MapReduce for distributed database processing

(Original Assignee) Yahoo Inc     (Current Assignee) R2 Solutions LLC ; Altaba Inc

Ali Dasdan, Hung-Chih Yang, Ruey-Lung Hsiao
SPEECH COMMUNICATION. 17 (3-4): 263-271 NOV 1995

Publication Year: 1995

INTERACTIVE VOICE TECHNOLOGY AT WORK - THE CSELT EXPERIENCE

Centro Studi e Laboratori Telecomunicazioni (CSELT S.p.A.)

Billi, Canavesio, Ciaramella, Nebbia
US8190610B2
CLAIM 1
. A method of processing data of a data set over a distributed system , wherein the data set comprises a plurality of data groups , the method comprising : partitioning the data of each one of the data groups into a plurality of data partitions that each have a plurality of key-value pairs and providing each data partition to a selected one of a plurality of mapping functions (broader set) that are each user-configurable to independently output a plurality of lists of values for each of a set of keys found in such map function' ;

s corresponding data partition to form corresponding intermediate data for that data group and identifiable to that data group , wherein the data of a first data group has a different schema than the data of a second data group and the data of the first data group is mapped differently than the data of the second data group so that different lists of values are output for the corresponding different intermediate data , wherein the different schema and corresponding different intermediate data have a key in common ;

and reducing the intermediate data for the data groups to at least one output data group , including processing the intermediate data for each data group in a manner that is defined to correspond to that data group , so as to result in a merging of the corresponding different intermediate data based on the key in common , wherein the mapping and reducing operations are performed by a distributed system .
INTERACTIVE VOICE TECHNOLOGY AT WORK - THE CSELT EXPERIENCE . This paper is a survey of the speech technologies and applications developed at CSELT , some of which are employed in real services deployed in the Italian telephone network . With the rise of significant speech recognition and text-to-speech applications , the activity of our lab encompasses now a broader set (mapping functions) of activities , from new algorithmic approaches to speech product engineering and application development . In particular , the paper gives an overview of the products originated from our speech technology research . It describes two operative applications , namely a voice dialing service for large name directories , which is installed in the CSELT PABX , and an automated network service for directory assistance , which is now accessible to all the Italian telephone customers .

US8190610B2
CLAIM 6
. The method of claim 1 , wherein : the intermediate data includes a plurality of grouped sets of key/value pairs ;

the reducing step (speech recognition) is carried out by a plurality of reducers ;

the method further comprises : partitioning the intermediate data into a plurality of partitions , each key/value pair of the intermediate data being provided to a separate one of the partitions ;

and providing the intermediate data of each partition to a separate one of the reducers .
INTERACTIVE VOICE TECHNOLOGY AT WORK - THE CSELT EXPERIENCE . This paper is a survey of the speech technologies and applications developed at CSELT , some of which are employed in real services deployed in the Italian telephone network . With the rise of significant speech recognition (reducing step) and text-to-speech applications , the activity of our lab encompasses now a broader set of activities , from new algorithmic approaches to speech product engineering and application development . In particular , the paper gives an overview of the products originated from our speech technology research . It describes two operative applications , namely a voice dialing service for large name directories , which is installed in the CSELT PABX , and an automated network service for directory assistance , which is now accessible to all the Italian telephone customers .

US8190610B2
CLAIM 7
. The method of claim 1 , wherein : the intermediate data includes a plurality of grouped sets of key/value pairs ;

the reducing step (speech recognition) is carried out by a plurality of reducers ;

the method further comprises : partitioning the intermediate data into a plurality of partitions , at least some of the key/value pairs of the intermediate data being provided to more than one of the partitions ;

and providing the intermediate data of each partition to a separate one of the reducers .
INTERACTIVE VOICE TECHNOLOGY AT WORK - THE CSELT EXPERIENCE . This paper is a survey of the speech technologies and applications developed at CSELT , some of which are employed in real services deployed in the Italian telephone network . With the rise of significant speech recognition (reducing step) and text-to-speech applications , the activity of our lab encompasses now a broader set of activities , from new algorithmic approaches to speech product engineering and application development . In particular , the paper gives an overview of the products originated from our speech technology research . It describes two operative applications , namely a voice dialing service for large name directories , which is installed in the CSELT PABX , and an automated network service for directory assistance , which is now accessible to all the Italian telephone customers .

US8190610B2
CLAIM 10
. The method of claim 9 , wherein : the reducing step (speech recognition) includes processing the metadata .
INTERACTIVE VOICE TECHNOLOGY AT WORK - THE CSELT EXPERIENCE . This paper is a survey of the speech technologies and applications developed at CSELT , some of which are employed in real services deployed in the Italian telephone network . With the rise of significant speech recognition (reducing step) and text-to-speech applications , the activity of our lab encompasses now a broader set of activities , from new algorithmic approaches to speech product engineering and application development . In particular , the paper gives an overview of the products originated from our speech technology research . It describes two operative applications , namely a voice dialing service for large name directories , which is installed in the CSELT PABX , and an automated network service for directory assistance , which is now accessible to all the Italian telephone customers .

US8190610B2
CLAIM 11
. The method of claim 10 , wherein : processing the intermediate data for each data group in a manner that is defined to correspond to that data group includes , for each data group , employing an iterator that corresponds to that data group , wherein the iterator includes providing the associated metadata to the processing of the reducing step (speech recognition) .
INTERACTIVE VOICE TECHNOLOGY AT WORK - THE CSELT EXPERIENCE . This paper is a survey of the speech technologies and applications developed at CSELT , some of which are employed in real services deployed in the Italian telephone network . With the rise of significant speech recognition (reducing step) and text-to-speech applications , the activity of our lab encompasses now a broader set of activities , from new algorithmic approaches to speech product engineering and application development . In particular , the paper gives an overview of the products originated from our speech technology research . It describes two operative applications , namely a voice dialing service for large name directories , which is installed in the CSELT PABX , and an automated network service for directory assistance , which is now accessible to all the Italian telephone customers .

US8190610B2
CLAIM 12
. The method of claim 5 , wherein : the intermediate data includes a plurality of grouped sets of key/value pairs ;

the reducing step (speech recognition) is carried out by a plurality of reducers ;

for at least one of the reducers , the iterator corresponding to a particular data group , for that reducer , operates according to a different key of a different schema than the iterator corresponding to another particular data group , for that reducer .
INTERACTIVE VOICE TECHNOLOGY AT WORK - THE CSELT EXPERIENCE . This paper is a survey of the speech technologies and applications developed at CSELT , some of which are employed in real services deployed in the Italian telephone network . With the rise of significant speech recognition (reducing step) and text-to-speech applications , the activity of our lab encompasses now a broader set of activities , from new algorithmic approaches to speech product engineering and application development . In particular , the paper gives an overview of the products originated from our speech technology research . It describes two operative applications , namely a voice dialing service for large name directories , which is installed in the CSELT PABX , and an automated network service for directory assistance , which is now accessible to all the Italian telephone customers .

US8190610B2
CLAIM 13
. The method of claim 1 , wherein : the intermediate data processing step of the reducing step (speech recognition) further comprises processing data that is not intermediate data .
INTERACTIVE VOICE TECHNOLOGY AT WORK - THE CSELT EXPERIENCE . This paper is a survey of the speech technologies and applications developed at CSELT , some of which are employed in real services deployed in the Italian telephone network . With the rise of significant speech recognition (reducing step) and text-to-speech applications , the activity of our lab encompasses now a broader set of activities , from new algorithmic approaches to speech product engineering and application development . In particular , the paper gives an overview of the products originated from our speech technology research . It describes two operative applications , namely a voice dialing service for large name directories , which is installed in the CSELT PABX , and an automated network service for directory assistance , which is now accessible to all the Italian telephone customers .

US8190610B2
CLAIM 14
. The method of claim 13 , wherein : the reducing step (speech recognition) is carried out by a plurality of reducers ;

and the data that is not intermediate data , for a particular reducer , includes data that is associated with another reducer .
INTERACTIVE VOICE TECHNOLOGY AT WORK - THE CSELT EXPERIENCE . This paper is a survey of the speech technologies and applications developed at CSELT , some of which are employed in real services deployed in the Italian telephone network . With the rise of significant speech recognition (reducing step) and text-to-speech applications , the activity of our lab encompasses now a broader set of activities , from new algorithmic approaches to speech product engineering and application development . In particular , the paper gives an overview of the products originated from our speech technology research . It describes two operative applications , namely a voice dialing service for large name directories , which is installed in the CSELT PABX , and an automated network service for directory assistance , which is now accessible to all the Italian telephone customers .

US8190610B2
CLAIM 15
. The method of claim 13 , wherein : the reducing step (speech recognition) is carried out by a plurality of reducers ;

and the data that is not intermediate data , for a particular reducer , includes data that is associated with that reducer .
INTERACTIVE VOICE TECHNOLOGY AT WORK - THE CSELT EXPERIENCE . This paper is a survey of the speech technologies and applications developed at CSELT , some of which are employed in real services deployed in the Italian telephone network . With the rise of significant speech recognition (reducing step) and text-to-speech applications , the activity of our lab encompasses now a broader set of activities , from new algorithmic approaches to speech product engineering and application development . In particular , the paper gives an overview of the products originated from our speech technology research . It describes two operative applications , namely a voice dialing service for large name directories , which is installed in the CSELT PABX , and an automated network service for directory assistance , which is now accessible to all the Italian telephone customers .

US8190610B2
CLAIM 16
. The method of claim 1 wherein the reducing step (speech recognition) includes relating the data among the plurality of data groups .
INTERACTIVE VOICE TECHNOLOGY AT WORK - THE CSELT EXPERIENCE . This paper is a survey of the speech technologies and applications developed at CSELT , some of which are employed in real services deployed in the Italian telephone network . With the rise of significant speech recognition (reducing step) and text-to-speech applications , the activity of our lab encompasses now a broader set of activities , from new algorithmic approaches to speech product engineering and application development . In particular , the paper gives an overview of the products originated from our speech technology research . It describes two operative applications , namely a voice dialing service for large name directories , which is installed in the CSELT PABX , and an automated network service for directory assistance , which is now accessible to all the Italian telephone customers .

US8190610B2
CLAIM 17
. A computer system including a plurality of computing devices , the computer system configured to process data of a data set , wherein the data set comprises a plurality of data groups , the computer system comprises at least one processor and memory that are operable to perform the following operations : partitioning the data of each one of the data groups into a plurality of data partitions that each have a plurality of key-value pairs and providing each data partition to a selected one of a plurality of mapping functions (broader set) that are each user-configurable to independently output a plurality of lists of values for each of a set of keys found in such map function' ;

s corresponding data partition to form corresponding intermediate data for that data group and identifiable to that data group , wherein the data of a first data group has a different schema than the data of a second data group and the data of the first data group is mapped differently than the data of the second data group so that different lists of values are output for the corresponding different intermediate data , wherein the different schema and corresponding different intermediate data have a key in common ;

and reduce the intermediate data for the data groups to at least one output data group , including processing the intermediate data for each data group in a manner that is defined to correspond to that data group so as to result in a merging of the corresponding different intermediate data based on the key in common .
INTERACTIVE VOICE TECHNOLOGY AT WORK - THE CSELT EXPERIENCE . This paper is a survey of the speech technologies and applications developed at CSELT , some of which are employed in real services deployed in the Italian telephone network . With the rise of significant speech recognition and text-to-speech applications , the activity of our lab encompasses now a broader set (mapping functions) of activities , from new algorithmic approaches to speech product engineering and application development . In particular , the paper gives an overview of the products originated from our speech technology research . It describes two operative applications , namely a voice dialing service for large name directories , which is installed in the CSELT PABX , and an automated network service for directory assistance , which is now accessible to all the Italian telephone customers .

US8190610B2
CLAIM 33
. A map-reduce method of processing data from a plurality of groups having different schema over a computer system , the method comprising : for a first data set having a plurality of first key-value pairs , wherein such first data set belongs to a first data group and the first key-value pairs have a first schema , partitioning the first data set into a plurality of data partitions and providing each data partition to a selected one of a plurality of mapping functions (broader set) that are each user-configurable to independently output a plurality of lists of values for each of a set of keys found in such map function' ;

s corresponding data partition to form a first intermediate data set having a first set of resulting key-value pairs ;

for a second data set having a plurality of second key-value pairs , wherein such second data set belongs to a second data group and the second key-value pairs have a second schema , partitioning the first data set into a plurality of data partitions and providing each data partition to a selected one of a plurality of mapping functions to form a second intermediate data set having a second set of resulting key-value pairs differing from the first set of resulting key-value pairs , wherein the first schema differs from the second schema , wherein the first and second schema and the first and second set of resulting key-value pairs have a key in common ;

and reducing the first and the second intermediate data set together so as to form an output data set , wherein the reducing is accomplished by iterating on at least one key from the first and second intermediate data set , and the output data set has a different schema than the first and second schema , wherein the mapping and reducing operations are performed by a distributed system .
INTERACTIVE VOICE TECHNOLOGY AT WORK - THE CSELT EXPERIENCE . This paper is a survey of the speech technologies and applications developed at CSELT , some of which are employed in real services deployed in the Italian telephone network . With the rise of significant speech recognition and text-to-speech applications , the activity of our lab encompasses now a broader set (mapping functions) of activities , from new algorithmic approaches to speech product engineering and application development . In particular , the paper gives an overview of the products originated from our speech technology research . It describes two operative applications , namely a voice dialing service for large name directories , which is installed in the CSELT PABX , and an automated network service for directory assistance , which is now accessible to all the Italian telephone customers .

US8190610B2
CLAIM 39
. The map-reduce method of claim 38 , wherein iterating includes providing the associated metadata to the processing of the reducing step (speech recognition) .
INTERACTIVE VOICE TECHNOLOGY AT WORK - THE CSELT EXPERIENCE . This paper is a survey of the speech technologies and applications developed at CSELT , some of which are employed in real services deployed in the Italian telephone network . With the rise of significant speech recognition (reducing step) and text-to-speech applications , the activity of our lab encompasses now a broader set of activities , from new algorithmic approaches to speech product engineering and application development . In particular , the paper gives an overview of the products originated from our speech technology research . It describes two operative applications , namely a voice dialing service for large name directories , which is installed in the CSELT PABX , and an automated network service for directory assistance , which is now accessible to all the Italian telephone customers .

US8190610B2
CLAIM 40
. A computer system including a plurality of computing devices , the computer system configured to process data of a data set , wherein the data set comprises a plurality of data groups , the computer system comprises at least one processor and memory that are operable to perform the following operations : for a first data set having a plurality of first key-value pairs , wherein such first data set belongs to a first data group and the first key-value pairs have a first schema , partitioning the first data set into a plurality of data partitions and providing each data partition to a selected one of a plurality of mapping functions (broader set) that are each user-configurable to independently output a plurality of lists of values for each of a set of keys found in such map function' ;

s corresponding data partition to form a first intermediate data set having a first set of resulting key-value pairs ;

for a second data set having a plurality of second key-value pairs , wherein such second data set belongs to a second data group and the second key-value pairs have a second schema , partitioning the first data set into a plurality of data partitions and providing each data partition to a selected one of a plurality of mapping functions to form a second intermediate data set having a second set of resulting key-value pairs differing from the first set of resulting key-value pairs , wherein the first schema differs from the second schema ;

and reducing the first and the second intermediate data set together so as to form an output data set , wherein the reducing is accomplished by iterating on at least one key from the first and second intermediate data set , and the output data set has a different schema than the first and second schema , wherein the first and second schema and the first and second set of resulting key-value pairs have a key in common .
INTERACTIVE VOICE TECHNOLOGY AT WORK - THE CSELT EXPERIENCE . This paper is a survey of the speech technologies and applications developed at CSELT , some of which are employed in real services deployed in the Italian telephone network . With the rise of significant speech recognition and text-to-speech applications , the activity of our lab encompasses now a broader set (mapping functions) of activities , from new algorithmic approaches to speech product engineering and application development . In particular , the paper gives an overview of the products originated from our speech technology research . It describes two operative applications , namely a voice dialing service for large name directories , which is installed in the CSELT PABX , and an automated network service for directory assistance , which is now accessible to all the Italian telephone customers .

US8190610B2
CLAIM 46
. The computer system of claim 45 , wherein iterating includes providing the associated metadata to the processing of the reducing step (speech recognition) .
INTERACTIVE VOICE TECHNOLOGY AT WORK - THE CSELT EXPERIENCE . This paper is a survey of the speech technologies and applications developed at CSELT , some of which are employed in real services deployed in the Italian telephone network . With the rise of significant speech recognition (reducing step) and text-to-speech applications , the activity of our lab encompasses now a broader set of activities , from new algorithmic approaches to speech product engineering and application development . In particular , the paper gives an overview of the products originated from our speech technology research . It describes two operative applications , namely a voice dialing service for large name directories , which is installed in the CSELT PABX , and an automated network service for directory assistance , which is now accessible to all the Italian telephone customers .




US8190610B2

Filed: 2006-10-05     Issued: 2012-05-29

MapReduce for distributed database processing

(Original Assignee) Yahoo Inc     (Current Assignee) R2 Solutions LLC ; Altaba Inc

Ali Dasdan, Hung-Chih Yang, Ruey-Lung Hsiao
LECTURE NOTES IN COMPUTER SCIENCE. 637: 404-425 1992

Publication Year: 1992

OBJECT TYPE DIRECTED GARBAGE COLLECTION TO IMPROVE LOCALITY

University of Texas, University of Illinois

Lam, Wilson, Moher
US8190610B2
CLAIM 1
. A method of processing data of a data set over a distributed system , wherein the data set comprises a plurality of data groups , the method comprising : partitioning the data of each one of the data groups into a plurality of data partitions that each have a plurality of key-value pairs and providing each data partition to a selected one of a plurality of mapping functions that are each user-configurable to independently output a plurality of lists of values for each of a set of keys found in such map function' ;

s corresponding data partition to form corresponding intermediate data for that data group and identifiable to that data group , wherein the data of a first data group has a different schema than the data of a second data group and the data of the first data group is mapped differently than the data of the second data group so that different lists (first order) of values are output for the corresponding different intermediate data (first order) , wherein the different schema and corresponding different intermediate data have a key in common ;

and reducing the intermediate data for the data groups to at least one output data group , including processing the intermediate data for each data group in a manner that is defined to correspond to that data group , so as to result in a merging of the corresponding different intermediate data based on the key in common , wherein the mapping and reducing operations are performed by a distributed system .
OBJECT TYPE DIRECTED GARBAGE COLLECTION TO IMPROVE LOCALITY . Most garbage collected systems have excessive need for RAM to achieve reasonable performance without too much paging . The reason for such poor locality is the way data are organized in the heap . Conventional organization approaches such as breadth-first order (different lists, different intermediate data) ing do not always bring objects in the same active working set together . When such co-active objects are distributed throughout the heap (on different memory pages) , high paging costs will result from accessing objects during execution . To alleviate such poor ordering , researchers have tried many different approaches : depth-first ordering , dynamic reorganization , object creation ordering , and hierarchical decomposition . Each of these approaches has its associated costs , effectiveness , and limitations . This paper presents a new ordering approach to improve locality . By paying a little attention to object type and format , effective heuristics can be derived to group co-active objects together . To investigate this idea , a number of such object type directed grouping techniques are incorporated into a Scheme-48 system . Page fault reduction of up to an order of magnitude was observed .

US8190610B2
CLAIM 12
. The method of claim 5 , wherein : the intermediate data includes a plurality of grouped sets of key/value pairs ;

the reducing step is carried out by a plurality of reducers ;

for at least one of the reducers , the iterator corresponding to a particular data group , for that reducer , operates according to a different key (memory pages) of a different schema than the iterator corresponding to another particular data group , for that reducer .
OBJECT TYPE DIRECTED GARBAGE COLLECTION TO IMPROVE LOCALITY . Most garbage collected systems have excessive need for RAM to achieve reasonable performance without too much paging . The reason for such poor locality is the way data are organized in the heap . Conventional organization approaches such as breadth-first ordering do not always bring objects in the same active working set together . When such co-active objects are distributed throughout the heap (on different memory pages (different key) ) , high paging costs will result from accessing objects during execution . To alleviate such poor ordering , researchers have tried many different approaches : depth-first ordering , dynamic reorganization , object creation ordering , and hierarchical decomposition . Each of these approaches has its associated costs , effectiveness , and limitations . This paper presents a new ordering approach to improve locality . By paying a little attention to object type and format , effective heuristics can be derived to group co-active objects together . To investigate this idea , a number of such object type directed grouping techniques are incorporated into a Scheme-48 system . Page fault reduction of up to an order of magnitude was observed .

US8190610B2
CLAIM 17
. A computer system including a plurality of computing devices , the computer system configured to process data of a data set , wherein the data set comprises a plurality of data groups , the computer system comprises at least one processor and memory that are operable to perform the following operations : partitioning the data of each one of the data groups into a plurality of data partitions that each have a plurality of key-value pairs and providing each data partition to a selected one of a plurality of mapping functions that are each user-configurable to independently output a plurality of lists of values for each of a set of keys found in such map function' ;

s corresponding data partition to form corresponding intermediate data for that data group and identifiable to that data group , wherein the data of a first data group has a different schema than the data of a second data group and the data of the first data group is mapped differently than the data of the second data group so that different lists (first order) of values are output for the corresponding different intermediate data (first order) , wherein the different schema and corresponding different intermediate data have a key in common ;

and reduce the intermediate data for the data groups to at least one output data group , including processing the intermediate data for each data group in a manner that is defined to correspond to that data group so as to result in a merging of the corresponding different intermediate data based on the key in common .
OBJECT TYPE DIRECTED GARBAGE COLLECTION TO IMPROVE LOCALITY . Most garbage collected systems have excessive need for RAM to achieve reasonable performance without too much paging . The reason for such poor locality is the way data are organized in the heap . Conventional organization approaches such as breadth-first order (different lists, different intermediate data) ing do not always bring objects in the same active working set together . When such co-active objects are distributed throughout the heap (on different memory pages) , high paging costs will result from accessing objects during execution . To alleviate such poor ordering , researchers have tried many different approaches : depth-first ordering , dynamic reorganization , object creation ordering , and hierarchical decomposition . Each of these approaches has its associated costs , effectiveness , and limitations . This paper presents a new ordering approach to improve locality . By paying a little attention to object type and format , effective heuristics can be derived to group co-active objects together . To investigate this idea , a number of such object type directed grouping techniques are incorporated into a Scheme-48 system . Page fault reduction of up to an order of magnitude was observed .

US8190610B2
CLAIM 22
. The computer system of claim 21 , wherein : the intermediate data includes a plurality of grouped sets of key/value pairs ;

the at least one processor and memory are further operable to reduce the intermediate data via a plurality of reducers ;

for at least one of the reducers , the iterator corresponding to a particular data group , for that reducer , is configured to operate according to a different key (memory pages) of a different schema than the iterator corresponding to another particular data group , for that reducer .
OBJECT TYPE DIRECTED GARBAGE COLLECTION TO IMPROVE LOCALITY . Most garbage collected systems have excessive need for RAM to achieve reasonable performance without too much paging . The reason for such poor locality is the way data are organized in the heap . Conventional organization approaches such as breadth-first ordering do not always bring objects in the same active working set together . When such co-active objects are distributed throughout the heap (on different memory pages (different key) ) , high paging costs will result from accessing objects during execution . To alleviate such poor ordering , researchers have tried many different approaches : depth-first ordering , dynamic reorganization , object creation ordering , and hierarchical decomposition . Each of these approaches has its associated costs , effectiveness , and limitations . This paper presents a new ordering approach to improve locality . By paying a little attention to object type and format , effective heuristics can be derived to group co-active objects together . To investigate this idea , a number of such object type directed grouping techniques are incorporated into a Scheme-48 system . Page fault reduction of up to an order of magnitude was observed .

US8190610B2
CLAIM 33
. A map-reduce method of processing data from a plurality of groups having different schema over a computer system , the method comprising : for a first data set having a plurality of first key-value pairs , wherein such first data set belongs to a first data group and the first key-value pairs have a first schema , partitioning the first data set into a plurality of data partitions and providing each data partition to a selected one of a plurality of mapping functions that are each user-configurable to independently output a plurality of lists of values for each of a set of keys found in such map function' ;

s corresponding data partition to form a first intermediate data (working set) set having a first set of resulting key-value pairs ;

for a second data set having a plurality of second key-value pairs , wherein such second data set belongs to a second data group and the second key-value pairs have a second schema , partitioning the first data set into a plurality of data partitions and providing each data partition to a selected one of a plurality of mapping functions to form a second intermediate data set having a second set of resulting key-value pairs differing from the first set of resulting key-value pairs , wherein the first schema differs from the second schema , wherein the first and second schema and the first and second set of resulting key-value pairs have a key in common ;

and reducing the first and the second intermediate data set together so as to form an output data set , wherein the reducing is accomplished by iterating on at least one key from the first and second intermediate data set , and the output data set has a different schema than the first and second schema , wherein the mapping and reducing operations are performed by a distributed system .
OBJECT TYPE DIRECTED GARBAGE COLLECTION TO IMPROVE LOCALITY . Most garbage collected systems have excessive need for RAM to achieve reasonable performance without too much paging . The reason for such poor locality is the way data are organized in the heap . Conventional organization approaches such as breadth-first ordering do not always bring objects in the same active working set (first intermediate data) together . When such co-active objects are distributed throughout the heap (on different memory pages) , high paging costs will result from accessing objects during execution . To alleviate such poor ordering , researchers have tried many different approaches : depth-first ordering , dynamic reorganization , object creation ordering , and hierarchical decomposition . Each of these approaches has its associated costs , effectiveness , and limitations . This paper presents a new ordering approach to improve locality . By paying a little attention to object type and format , effective heuristics can be derived to group co-active objects together . To investigate this idea , a number of such object type directed grouping techniques are incorporated into a Scheme-48 system . Page fault reduction of up to an order of magnitude was observed .

US8190610B2
CLAIM 40
. A computer system including a plurality of computing devices , the computer system configured to process data of a data set , wherein the data set comprises a plurality of data groups , the computer system comprises at least one processor and memory that are operable to perform the following operations : for a first data set having a plurality of first key-value pairs , wherein such first data set belongs to a first data group and the first key-value pairs have a first schema , partitioning the first data set into a plurality of data partitions and providing each data partition to a selected one of a plurality of mapping functions that are each user-configurable to independently output a plurality of lists of values for each of a set of keys found in such map function' ;

s corresponding data partition to form a first intermediate data (working set) set having a first set of resulting key-value pairs ;

for a second data set having a plurality of second key-value pairs , wherein such second data set belongs to a second data group and the second key-value pairs have a second schema , partitioning the first data set into a plurality of data partitions and providing each data partition to a selected one of a plurality of mapping functions to form a second intermediate data set having a second set of resulting key-value pairs differing from the first set of resulting key-value pairs , wherein the first schema differs from the second schema ;

and reducing the first and the second intermediate data set together so as to form an output data set , wherein the reducing is accomplished by iterating on at least one key from the first and second intermediate data set , and the output data set has a different schema than the first and second schema , wherein the first and second schema and the first and second set of resulting key-value pairs have a key in common .
OBJECT TYPE DIRECTED GARBAGE COLLECTION TO IMPROVE LOCALITY . Most garbage collected systems have excessive need for RAM to achieve reasonable performance without too much paging . The reason for such poor locality is the way data are organized in the heap . Conventional organization approaches such as breadth-first ordering do not always bring objects in the same active working set (first intermediate data) together . When such co-active objects are distributed throughout the heap (on different memory pages) , high paging costs will result from accessing objects during execution . To alleviate such poor ordering , researchers have tried many different approaches : depth-first ordering , dynamic reorganization , object creation ordering , and hierarchical decomposition . Each of these approaches has its associated costs , effectiveness , and limitations . This paper presents a new ordering approach to improve locality . By paying a little attention to object type and format , effective heuristics can be derived to group co-active objects together . To investigate this idea , a number of such object type directed grouping techniques are incorporated into a Scheme-48 system . Page fault reduction of up to an order of magnitude was observed .




US8190610B2

Filed: 2006-10-05     Issued: 2012-05-29

MapReduce for distributed database processing

(Original Assignee) Yahoo Inc     (Current Assignee) R2 Solutions LLC ; Altaba Inc

Ali Dasdan, Hung-Chih Yang, Ruey-Lung Hsiao
US20060190243A1

Filed: 2006-02-21     Issued: 2006-08-24

Method and apparatus for data management

(Original Assignee) Xeround Systems Ltd; Xeround Systems Inc     (Current Assignee) NORTHEND NETWORKS Ltd

Sharon Barkai, Gilad Zlotkin, Avi Vigder, Nir Klar, Yaniv Romem, Ayelet Shomer, Iris Kaminer, Roni Levy, Zeev Broude, Ilia Gilderman
US8190610B2
CLAIM 1
. A method of processing data of a data set over a distributed system , wherein the data set comprises a plurality of data groups (two locations) , the method comprising : partitioning the data of each one of the data groups into a plurality of data partitions (data partitions) that each have a plurality of key-value pairs and providing each data partition to a selected one of a plurality of mapping functions that are each user-configurable to independently output a plurality of lists of values for each of a set of keys found in such map function' ;

s corresponding data partition to form corresponding intermediate data for that data group and identifiable to that data group , wherein the data of a first data (one second) group has a different schema than the data of a second data group and the data of the first data group is mapped differently than the data of the second data group so that different lists of values are output for the corresponding different intermediate data , wherein the different schema and corresponding different intermediate data have a key in common ;

and reducing the intermediate data for the data groups to at least one output data group , including processing the intermediate data for each data group in a manner that is defined to correspond to that data group , so as to result in a merging of the corresponding different intermediate data based on the key in common , wherein the mapping and reducing operations are performed by a distributed system .
US20060190243A1
CLAIM 5
. The data access system of claim 4 , wherein data is assigned in the form of records having a primary key and at least one second (first data, first data set) ary keys , and wherein the hashing process is carried out on said primary key .

US20060190243A1
CLAIM 8
. The data access system of claim 1 , wherein data is replicated at least once over at least two data partitions (data partitions) .

US20060190243A1
CLAIM 41
. The method of claim 40 , comprising copying individual data items to at least two locations (data groups) in said data storage resource and providing a group address said at least two locations .

US8190610B2
CLAIM 2
. The method of claim 1 , wherein : the at least one output data group is a plurality of output data groups (two locations) .
US20060190243A1
CLAIM 41
. The method of claim 40 , comprising copying individual data items to at least two locations (data groups) in said data storage resource and providing a group address said at least two locations .

US8190610B2
CLAIM 13
. The method of claim 1 , wherein : the intermediate data processing step (odd number) of the reducing step further comprises processing data that is not intermediate data .
US20060190243A1
CLAIM 20
. The data access system of claim 19 , wherein said number being at least three is an odd number (intermediate data processing step) , thereby allowing majority voting between said copied virtual partitions to ensure integrity of said data .

US8190610B2
CLAIM 16
. The method of claim 1 wherein the reducing step includes relating the data among the plurality of data groups (two locations) .
US20060190243A1
CLAIM 41
. The method of claim 40 , comprising copying individual data items to at least two locations (data groups) in said data storage resource and providing a group address said at least two locations .

US8190610B2
CLAIM 17
. A computer system including a plurality of computing devices , the computer system configured to process data of a data set , wherein the data set comprises a plurality of data groups (two locations) , the computer system comprises at least one processor and memory that are operable to perform the following operations : partitioning the data of each one of the data groups into a plurality of data partitions (data partitions) that each have a plurality of key-value pairs and providing each data partition to a selected one of a plurality of mapping functions that are each user-configurable to independently output a plurality of lists of values for each of a set of keys found in such map function' ;

s corresponding data partition to form corresponding intermediate data for that data group and identifiable to that data group , wherein the data of a first data (one second) group has a different schema than the data of a second data group and the data of the first data group is mapped differently than the data of the second data group so that different lists of values are output for the corresponding different intermediate data , wherein the different schema and corresponding different intermediate data have a key in common ;

and reduce the intermediate data for the data groups to at least one output data group , including processing the intermediate data for each data group in a manner that is defined to correspond to that data group so as to result in a merging of the corresponding different intermediate data based on the key in common .
US20060190243A1
CLAIM 5
. The data access system of claim 4 , wherein data is assigned in the form of records having a primary key and at least one second (first data, first data set) ary keys , and wherein the hashing process is carried out on said primary key .

US20060190243A1
CLAIM 8
. The data access system of claim 1 , wherein data is replicated at least once over at least two data partitions (data partitions) .

US20060190243A1
CLAIM 41
. The method of claim 40 , comprising copying individual data items to at least two locations (data groups) in said data storage resource and providing a group address said at least two locations .

US8190610B2
CLAIM 18
. The computer system of claim 17 , wherein : the at least one output data group is a plurality of output data groups (two locations) .
US20060190243A1
CLAIM 41
. The method of claim 40 , comprising copying individual data items to at least two locations (data groups) in said data storage resource and providing a group address said at least two locations .

US8190610B2
CLAIM 32
. The computer system of claim 17 , wherein the at least one processor and memory are further operable to reduce by relating the data among the plurality of data groups (two locations) .
US20060190243A1
CLAIM 41
. The method of claim 40 , comprising copying individual data items to at least two locations (data groups) in said data storage resource and providing a group address said at least two locations .

US8190610B2
CLAIM 33
. A map-reduce method of processing data from a plurality of groups having different schema over a computer system , the method comprising : for a first data (one second) set having a plurality of first key-value pairs , wherein such first data set belongs to a first data group and the first key-value pairs have a first schema , partitioning the first data set into a plurality of data partitions (data partitions) and providing each data partition to a selected one of a plurality of mapping functions that are each user-configurable to independently output a plurality of lists of values for each of a set of keys found in such map function' ;

s corresponding data partition to form a first intermediate data set having a first set of resulting key-value pairs ;

for a second data set having a plurality of second key-value pairs , wherein such second data set belongs to a second data group and the second key-value pairs have a second schema , partitioning the first data set into a plurality of data partitions and providing each data partition to a selected one of a plurality of mapping functions to form a second intermediate data set having a second set of resulting key-value pairs differing from the first set of resulting key-value pairs , wherein the first schema differs from the second schema , wherein the first and second schema and the first and second set of resulting key-value pairs have a key in common ;

and reducing the first and the second intermediate data set together so as to form an output data set , wherein the reducing is accomplished by iterating on at least one key from the first and second intermediate data set , and the output data set has a different schema than the first and second schema , wherein the mapping and reducing operations are performed by a distributed system .
US20060190243A1
CLAIM 5
. The data access system of claim 4 , wherein data is assigned in the form of records having a primary key and at least one second (first data, first data set) ary keys , and wherein the hashing process is carried out on said primary key .

US20060190243A1
CLAIM 8
. The data access system of claim 1 , wherein data is replicated at least once over at least two data partitions (data partitions) .

US8190610B2
CLAIM 40
. A computer system including a plurality of computing devices , the computer system configured to process data of a data set , wherein the data set comprises a plurality of data groups (two locations) , the computer system comprises at least one processor and memory that are operable to perform the following operations : for a first data (one second) set having a plurality of first key-value pairs , wherein such first data set belongs to a first data group and the first key-value pairs have a first schema , partitioning the first data set into a plurality of data partitions (data partitions) and providing each data partition to a selected one of a plurality of mapping functions that are each user-configurable to independently output a plurality of lists of values for each of a set of keys found in such map function' ;

s corresponding data partition to form a first intermediate data set having a first set of resulting key-value pairs ;

for a second data set having a plurality of second key-value pairs , wherein such second data set belongs to a second data group and the second key-value pairs have a second schema , partitioning the first data set into a plurality of data partitions and providing each data partition to a selected one of a plurality of mapping functions to form a second intermediate data set having a second set of resulting key-value pairs differing from the first set of resulting key-value pairs , wherein the first schema differs from the second schema ;

and reducing the first and the second intermediate data set together so as to form an output data set , wherein the reducing is accomplished by iterating on at least one key from the first and second intermediate data set , and the output data set has a different schema than the first and second schema , wherein the first and second schema and the first and second set of resulting key-value pairs have a key in common .
US20060190243A1
CLAIM 5
. The data access system of claim 4 , wherein data is assigned in the form of records having a primary key and at least one second (first data, first data set) ary keys , and wherein the hashing process is carried out on said primary key .

US20060190243A1
CLAIM 8
. The data access system of claim 1 , wherein data is replicated at least once over at least two data partitions (data partitions) .

US20060190243A1
CLAIM 41
. The method of claim 40 , comprising copying individual data items to at least two locations (data groups) in said data storage resource and providing a group address said at least two locations .




US8190610B2

Filed: 2006-10-05     Issued: 2012-05-29

MapReduce for distributed database processing

(Original Assignee) Yahoo Inc     (Current Assignee) R2 Solutions LLC ; Altaba Inc

Ali Dasdan, Hung-Chih Yang, Ruey-Lung Hsiao
US20060031268A1

Filed: 2005-09-13     Issued: 2006-02-09

Systems and methods for the repartitioning of data

(Original Assignee) Microsoft Corp     (Current Assignee) Microsoft Technology Licensing LLC

David Shutt, Elizabeth Nichols
US8190610B2
CLAIM 1
. A method of processing data of a data set (said system, one second) over a distributed system , wherein the data set comprises a plurality of data groups , the method comprising : partitioning the data of each one of the data groups into a plurality of data partitions that each have a plurality of key-value pairs and providing each data partition to a selected one of a plurality of mapping functions that are each user-configurable to independently output a plurality of lists of values for each of a set of keys found in such map function' ;

s corresponding data partition to form corresponding intermediate data for that data group and identifiable to that data group , wherein the data of a first data (said system, one second) group has a different schema than the data of a second data group and the data of the first data group is mapped differently than the data of the second data group so that different lists of values are output for the corresponding different intermediate data , wherein the different schema and corresponding different intermediate data have a key in common ;

and reducing the intermediate data for the data groups to at least one output data group , including processing the intermediate data for each data group in a manner that is defined to correspond to that data group , so as to result in a merging of the corresponding different intermediate data based on the key in common , wherein the mapping and reducing operations (readable instructions) are performed by a distributed system .
US20060031268A1
CLAIM 29
. The system of claim 26 wherein the means for suspending writes to selected logical partitions comprises means for holding the writes in a buffer and redirecting that buffer to the logical partitions in the new physical partition , and said system (data set, first data set, second data set, first data) further comprising a means for re-enabling writes to the logical partitions in the new physical partition .

US20060031268A1
CLAIM 38
. A computer-readable medium for use with a federation of servers , the computer-readable medium comprising computer-readable instructions (reducing operations) for : determining a quantity of logical partitions to be moved to a new physical partition and selecting the specific logical partitions to be moved to the new physical partition ;
suspending writes to selected logical partitions ;
copying selected logical partitions to the new physical partition ;
redirecting reads and writes for the selected logical partitions to the logical partitions in the new physical partition ;
and removing the original selected logical partitions that were copied to the new physical partition ;
wherein the new physical partition comprises a primary data structure , for storing primary data , residing on a first server of the federation of servers and a secondary data structure , for storing a backup of the primary data , residing on a second server of the federation of servers ;
and wherein the selected logical partitions comprise a subset of the primary data in the primary data structure and a corresponding subset of the backup of the primary data in the secondary data structure .

US20060031268A1
CLAIM 48
. The computer-readable medium of claim 38 , wherein the physical partition further comprises additional secondary data structures residing on additional servers of the federation of servers wherein no more than one second (data set, first data set, second data set, first data) ary data structure resides on a single server .

US8190610B2
CLAIM 17
. A computer system including a plurality of computing devices , the computer system configured to process data of a data set (said system, one second) , wherein the data set comprises a plurality of data groups , the computer system comprises at least one processor and memory that are operable to perform the following operations : partitioning the data of each one of the data groups into a plurality of data partitions that each have a plurality of key-value pairs and providing each data partition to a selected one of a plurality of mapping functions that are each user-configurable to independently output a plurality of lists of values for each of a set of keys found in such map function' ;

s corresponding data partition to form corresponding intermediate data for that data group and identifiable to that data group , wherein the data of a first data (said system, one second) group has a different schema than the data of a second data group and the data of the first data group is mapped differently than the data of the second data group so that different lists of values are output for the corresponding different intermediate data , wherein the different schema and corresponding different intermediate data have a key in common ;

and reduce the intermediate data for the data groups to at least one output data group , including processing the intermediate data for each data group in a manner that is defined to correspond to that data group so as to result in a merging of the corresponding different intermediate data based on the key in common .
US20060031268A1
CLAIM 29
. The system of claim 26 wherein the means for suspending writes to selected logical partitions comprises means for holding the writes in a buffer and redirecting that buffer to the logical partitions in the new physical partition , and said system (data set, first data set, second data set, first data) further comprising a means for re-enabling writes to the logical partitions in the new physical partition .

US20060031268A1
CLAIM 48
. The computer-readable medium of claim 38 , wherein the physical partition further comprises additional secondary data structures residing on additional servers of the federation of servers wherein no more than one second (data set, first data set, second data set, first data) ary data structure resides on a single server .

US8190610B2
CLAIM 33
. A map-reduce method of processing data from a plurality of groups having different schema over a computer system , the method comprising : for a first data set (said system, one second) having a plurality of first key-value pairs , wherein such first data set belongs to a first data (said system, one second) group and the first key-value pairs have a first schema , partitioning the first data set into a plurality of data partitions and providing each data partition to a selected one of a plurality of mapping functions that are each user-configurable to independently output a plurality of lists of values for each of a set of keys found in such map function' ;

s corresponding data partition to form a first intermediate data set having a first set of resulting key-value pairs ;

for a second data set having a plurality of second key-value pairs , wherein such second data set belongs to a second data group and the second key-value pairs have a second schema , partitioning the first data set into a plurality of data partitions and providing each data partition to a selected one of a plurality of mapping functions to form a second intermediate data set having a second set of resulting key-value pairs differing from the first set of resulting key-value pairs , wherein the first schema differs from the second schema , wherein the first and second schema and the first and second set of resulting key-value pairs have a key in common ;

and reducing the first and the second intermediate data set together so as to form an output data set , wherein the reducing is accomplished by iterating on at least one key from the first and second intermediate data set , and the output data set has a different schema than the first and second schema , wherein the mapping and reducing operations (readable instructions) are performed by a distributed system .
US20060031268A1
CLAIM 29
. The system of claim 26 wherein the means for suspending writes to selected logical partitions comprises means for holding the writes in a buffer and redirecting that buffer to the logical partitions in the new physical partition , and said system (data set, first data set, second data set, first data) further comprising a means for re-enabling writes to the logical partitions in the new physical partition .

US20060031268A1
CLAIM 38
. A computer-readable medium for use with a federation of servers , the computer-readable medium comprising computer-readable instructions (reducing operations) for : determining a quantity of logical partitions to be moved to a new physical partition and selecting the specific logical partitions to be moved to the new physical partition ;
suspending writes to selected logical partitions ;
copying selected logical partitions to the new physical partition ;
redirecting reads and writes for the selected logical partitions to the logical partitions in the new physical partition ;
and removing the original selected logical partitions that were copied to the new physical partition ;
wherein the new physical partition comprises a primary data structure , for storing primary data , residing on a first server of the federation of servers and a secondary data structure , for storing a backup of the primary data , residing on a second server of the federation of servers ;
and wherein the selected logical partitions comprise a subset of the primary data in the primary data structure and a corresponding subset of the backup of the primary data in the secondary data structure .

US20060031268A1
CLAIM 48
. The computer-readable medium of claim 38 , wherein the physical partition further comprises additional secondary data structures residing on additional servers of the federation of servers wherein no more than one second (data set, first data set, second data set, first data) ary data structure resides on a single server .

US8190610B2
CLAIM 34
. The map-reduce method of claim 33 , wherein the reducing is accomplished by iterating on at least one key from each of the first and second intermediate data set (said system, one second) so that the output data set is a merging of a portion of the first and second intermediate data set .
US20060031268A1
CLAIM 29
. The system of claim 26 wherein the means for suspending writes to selected logical partitions comprises means for holding the writes in a buffer and redirecting that buffer to the logical partitions in the new physical partition , and said system (data set, first data set, second data set, first data) further comprising a means for re-enabling writes to the logical partitions in the new physical partition .

US20060031268A1
CLAIM 48
. The computer-readable medium of claim 38 , wherein the physical partition further comprises additional secondary data structures residing on additional servers of the federation of servers wherein no more than one second (data set, first data set, second data set, first data) ary data structure resides on a single server .

US8190610B2
CLAIM 35
. The map-reduce method of claim 33 , wherein the reducing is accomplished by a plurality of reducers , the method further comprises : partitioning the first or second intermediate data set (said system, one second) into a plurality of partitions , at least some of the key-value pairs of the first or second intermediate data being provided to more than one of the partitions ;

and providing the partitioned intermediate data set of each partition to a separate one of the reducers .
US20060031268A1
CLAIM 29
. The system of claim 26 wherein the means for suspending writes to selected logical partitions comprises means for holding the writes in a buffer and redirecting that buffer to the logical partitions in the new physical partition , and said system (data set, first data set, second data set, first data) further comprising a means for re-enabling writes to the logical partitions in the new physical partition .

US20060031268A1
CLAIM 48
. The computer-readable medium of claim 38 , wherein the physical partition further comprises additional secondary data structures residing on additional servers of the federation of servers wherein no more than one second (data set, first data set, second data set, first data) ary data structure resides on a single server .

US8190610B2
CLAIM 36
. The map-reduce method of claim 35 , wherein all of the key-value pairs of the first and second intermediate data set (said system, one second) are provided to all of the reducers .
US20060031268A1
CLAIM 29
. The system of claim 26 wherein the means for suspending writes to selected logical partitions comprises means for holding the writes in a buffer and redirecting that buffer to the logical partitions in the new physical partition , and said system (data set, first data set, second data set, first data) further comprising a means for re-enabling writes to the logical partitions in the new physical partition .

US20060031268A1
CLAIM 48
. The computer-readable medium of claim 38 , wherein the physical partition further comprises additional secondary data structures residing on additional servers of the federation of servers wherein no more than one second (data set, first data set, second data set, first data) ary data structure resides on a single server .

US8190610B2
CLAIM 40
. A computer system including a plurality of computing devices , the computer system configured to process data of a data set (said system, one second) , wherein the data set comprises a plurality of data groups , the computer system comprises at least one processor and memory that are operable to perform the following operations : for a first data set having a plurality of first key-value pairs , wherein such first data set belongs to a first data (said system, one second) group and the first key-value pairs have a first schema , partitioning the first data set into a plurality of data partitions and providing each data partition to a selected one of a plurality of mapping functions that are each user-configurable to independently output a plurality of lists of values for each of a set of keys found in such map function' ;

s corresponding data partition to form a first intermediate data set having a first set of resulting key-value pairs ;

for a second data set having a plurality of second key-value pairs , wherein such second data set belongs to a second data group and the second key-value pairs have a second schema , partitioning the first data set into a plurality of data partitions and providing each data partition to a selected one of a plurality of mapping functions to form a second intermediate data set having a second set of resulting key-value pairs differing from the first set of resulting key-value pairs , wherein the first schema differs from the second schema ;

and reducing the first and the second intermediate data set together so as to form an output data set , wherein the reducing is accomplished by iterating on at least one key from the first and second intermediate data set , and the output data set has a different schema than the first and second schema , wherein the first and second schema and the first and second set of resulting key-value pairs have a key in common .
US20060031268A1
CLAIM 29
. The system of claim 26 wherein the means for suspending writes to selected logical partitions comprises means for holding the writes in a buffer and redirecting that buffer to the logical partitions in the new physical partition , and said system (data set, first data set, second data set, first data) further comprising a means for re-enabling writes to the logical partitions in the new physical partition .

US20060031268A1
CLAIM 48
. The computer-readable medium of claim 38 , wherein the physical partition further comprises additional secondary data structures residing on additional servers of the federation of servers wherein no more than one second (data set, first data set, second data set, first data) ary data structure resides on a single server .

US8190610B2
CLAIM 41
. The computer system of claim 40 , wherein the reducing is accomplished by iterating on at least one key from each of the first and second intermediate data set (said system, one second) so that the output data set is a merging of a portion of the first and second intermediate data set .
US20060031268A1
CLAIM 29
. The system of claim 26 wherein the means for suspending writes to selected logical partitions comprises means for holding the writes in a buffer and redirecting that buffer to the logical partitions in the new physical partition , and said system (data set, first data set, second data set, first data) further comprising a means for re-enabling writes to the logical partitions in the new physical partition .

US20060031268A1
CLAIM 48
. The computer-readable medium of claim 38 , wherein the physical partition further comprises additional secondary data structures residing on additional servers of the federation of servers wherein no more than one second (data set, first data set, second data set, first data) ary data structure resides on a single server .

US8190610B2
CLAIM 42
. The computer system of claim 40 , wherein the reducing is accomplished by a plurality of reducers , the d the at least one processor and memory further operable for : partitioning the first or second intermediate data set (said system, one second) into a plurality of partitions , at least some of the key-value pairs of the first or second intermediate data being provided to more than one of the partitions ;

and providing the partitioned intermediate data set of each partition to a separate one of the reducers .
US20060031268A1
CLAIM 29
. The system of claim 26 wherein the means for suspending writes to selected logical partitions comprises means for holding the writes in a buffer and redirecting that buffer to the logical partitions in the new physical partition , and said system (data set, first data set, second data set, first data) further comprising a means for re-enabling writes to the logical partitions in the new physical partition .

US20060031268A1
CLAIM 48
. The computer-readable medium of claim 38 , wherein the physical partition further comprises additional secondary data structures residing on additional servers of the federation of servers wherein no more than one second (data set, first data set, second data set, first data) ary data structure resides on a single server .

US8190610B2
CLAIM 43
. The computer system of claim 42 , wherein all of the key-value pairs of the first and second intermediate data set (said system, one second) are provided to all of the reducers .
US20060031268A1
CLAIM 29
. The system of claim 26 wherein the means for suspending writes to selected logical partitions comprises means for holding the writes in a buffer and redirecting that buffer to the logical partitions in the new physical partition , and said system (data set, first data set, second data set, first data) further comprising a means for re-enabling writes to the logical partitions in the new physical partition .

US20060031268A1
CLAIM 48
. The computer-readable medium of claim 38 , wherein the physical partition further comprises additional secondary data structures residing on additional servers of the federation of servers wherein no more than one second (data set, first data set, second data set, first data) ary data structure resides on a single server .




US8190610B2

Filed: 2006-10-05     Issued: 2012-05-29

MapReduce for distributed database processing

(Original Assignee) Yahoo Inc     (Current Assignee) R2 Solutions LLC ; Altaba Inc

Ali Dasdan, Hung-Chih Yang, Ruey-Lung Hsiao
US20050210082A1

Filed: 2005-05-09     Issued: 2005-09-22

Systems and methods for the repartitioning of data

(Original Assignee) Microsoft Corp     (Current Assignee) Microsoft Technology Licensing LLC

David Shutt, Elizabeth Nichols
US8190610B2
CLAIM 1
. A method of processing data of a data set (said system) over a distributed system , wherein the data set comprises a plurality of data groups , the method comprising : partitioning the data of each one of the data groups into a plurality of data partitions that each have a plurality of key-value pairs and providing each data partition to a selected one of a plurality of mapping functions that are each user-configurable to independently output a plurality of lists of values for each of a set of keys found in such map function' ;

s corresponding data partition to form corresponding intermediate data for that data group and identifiable to that data group , wherein the data of a first data group has a different schema than the data of a second data group and the data of the first data group is mapped differently than the data of the second data group so that different lists of values are output for the corresponding different intermediate data , wherein the different schema and corresponding different intermediate data have a key in common ;

and reducing the intermediate data for the data groups to at least one output data group , including processing the intermediate data for each data group in a manner that is defined to correspond to that data group , so as to result in a merging of the corresponding different intermediate data based on the key in common , wherein the mapping and reducing operations (readable instructions) are performed by a distributed system .
US20050210082A1
CLAIM 32
. A system for balancing load on a federation of servers , said system (data set, first data set, second data set) comprising : means for determining a quantity of logical partitions to be moved to the new physical partition and selecting the specific logical partitions to be moved to the new physical partition ;
means for creating a replication stream for selected logical partitions to the new physical partition ;
means for copying selected logical partitions to the new physical partition ;
means for deconflicting inconsistencies in the logical partitions on the new physical partition ;
means for redirecting reads and writes for the selected logical partitions to the logical partitions in the new physical partition ;
and means for removing the original selected logical partitions that were copied to the new physical partition .

US20050210082A1
CLAIM 41
. A computer-readable medium for use with a federation of servers , said computer-readable medium comprising computer-readable instructions (reducing operations) for : determining a quantity of logical partitions to be moved to the new physical partition and selecting the specific logical partitions to be moved to the new physical partition ;
creating a replication stream for selected logical partitions to the new physical partition ;
copying selected logical partitions to the new physical partition ;
deconflicting inconsistencies in the logical partitions on the new physical partition ;
redirecting reads and writes for the selected logical partitions to the logical partitions in the new physical partition ;
and removing the original selected logical partitions that were copied to the new physical partition .

US8190610B2
CLAIM 17
. A computer system including a plurality of computing devices , the computer system configured to process data of a data set (said system) , wherein the data set comprises a plurality of data groups , the computer system comprises at least one processor and memory that are operable to perform the following operations : partitioning the data of each one of the data groups into a plurality of data partitions that each have a plurality of key-value pairs and providing each data partition to a selected one of a plurality of mapping functions that are each user-configurable to independently output a plurality of lists of values for each of a set of keys found in such map function' ;

s corresponding data partition to form corresponding intermediate data for that data group and identifiable to that data group , wherein the data of a first data group has a different schema than the data of a second data group and the data of the first data group is mapped differently than the data of the second data group so that different lists of values are output for the corresponding different intermediate data , wherein the different schema and corresponding different intermediate data have a key in common ;

and reduce the intermediate data for the data groups to at least one output data group , including processing the intermediate data for each data group in a manner that is defined to correspond to that data group so as to result in a merging of the corresponding different intermediate data based on the key in common .
US20050210082A1
CLAIM 32
. A system for balancing load on a federation of servers , said system (data set, first data set, second data set) comprising : means for determining a quantity of logical partitions to be moved to the new physical partition and selecting the specific logical partitions to be moved to the new physical partition ;
means for creating a replication stream for selected logical partitions to the new physical partition ;
means for copying selected logical partitions to the new physical partition ;
means for deconflicting inconsistencies in the logical partitions on the new physical partition ;
means for redirecting reads and writes for the selected logical partitions to the logical partitions in the new physical partition ;
and means for removing the original selected logical partitions that were copied to the new physical partition .

US8190610B2
CLAIM 33
. A map-reduce method of processing data from a plurality of groups having different schema over a computer system , the method comprising : for a first data set (said system) having a plurality of first key-value pairs , wherein such first data set belongs to a first data group and the first key-value pairs have a first schema , partitioning the first data set into a plurality of data partitions and providing each data partition to a selected one of a plurality of mapping functions that are each user-configurable to independently output a plurality of lists of values for each of a set of keys found in such map function' ;

s corresponding data partition to form a first intermediate data set having a first set of resulting key-value pairs ;

for a second data set having a plurality of second key-value pairs , wherein such second data set belongs to a second data group and the second key-value pairs have a second schema , partitioning the first data set into a plurality of data partitions and providing each data partition to a selected one of a plurality of mapping functions to form a second intermediate data set having a second set of resulting key-value pairs differing from the first set of resulting key-value pairs , wherein the first schema differs from the second schema , wherein the first and second schema and the first and second set of resulting key-value pairs have a key in common ;

and reducing the first and the second intermediate data set together so as to form an output data set , wherein the reducing is accomplished by iterating on at least one key from the first and second intermediate data set , and the output data set has a different schema than the first and second schema , wherein the mapping and reducing operations (readable instructions) are performed by a distributed system .
US20050210082A1
CLAIM 32
. A system for balancing load on a federation of servers , said system (data set, first data set, second data set) comprising : means for determining a quantity of logical partitions to be moved to the new physical partition and selecting the specific logical partitions to be moved to the new physical partition ;
means for creating a replication stream for selected logical partitions to the new physical partition ;
means for copying selected logical partitions to the new physical partition ;
means for deconflicting inconsistencies in the logical partitions on the new physical partition ;
means for redirecting reads and writes for the selected logical partitions to the logical partitions in the new physical partition ;
and means for removing the original selected logical partitions that were copied to the new physical partition .

US20050210082A1
CLAIM 41
. A computer-readable medium for use with a federation of servers , said computer-readable medium comprising computer-readable instructions (reducing operations) for : determining a quantity of logical partitions to be moved to the new physical partition and selecting the specific logical partitions to be moved to the new physical partition ;
creating a replication stream for selected logical partitions to the new physical partition ;
copying selected logical partitions to the new physical partition ;
deconflicting inconsistencies in the logical partitions on the new physical partition ;
redirecting reads and writes for the selected logical partitions to the logical partitions in the new physical partition ;
and removing the original selected logical partitions that were copied to the new physical partition .

US8190610B2
CLAIM 34
. The map-reduce method of claim 33 , wherein the reducing is accomplished by iterating on at least one key from each of the first and second intermediate data set (said system) so that the output data set is a merging of a portion of the first and second intermediate data set .
US20050210082A1
CLAIM 32
. A system for balancing load on a federation of servers , said system (data set, first data set, second data set) comprising : means for determining a quantity of logical partitions to be moved to the new physical partition and selecting the specific logical partitions to be moved to the new physical partition ;
means for creating a replication stream for selected logical partitions to the new physical partition ;
means for copying selected logical partitions to the new physical partition ;
means for deconflicting inconsistencies in the logical partitions on the new physical partition ;
means for redirecting reads and writes for the selected logical partitions to the logical partitions in the new physical partition ;
and means for removing the original selected logical partitions that were copied to the new physical partition .

US8190610B2
CLAIM 35
. The map-reduce method of claim 33 , wherein the reducing is accomplished by a plurality of reducers , the method further comprises : partitioning the first or second intermediate data set (said system) into a plurality of partitions , at least some of the key-value pairs of the first or second intermediate data being provided to more than one of the partitions ;

and providing the partitioned intermediate data set of each partition to a separate one of the reducers .
US20050210082A1
CLAIM 32
. A system for balancing load on a federation of servers , said system (data set, first data set, second data set) comprising : means for determining a quantity of logical partitions to be moved to the new physical partition and selecting the specific logical partitions to be moved to the new physical partition ;
means for creating a replication stream for selected logical partitions to the new physical partition ;
means for copying selected logical partitions to the new physical partition ;
means for deconflicting inconsistencies in the logical partitions on the new physical partition ;
means for redirecting reads and writes for the selected logical partitions to the logical partitions in the new physical partition ;
and means for removing the original selected logical partitions that were copied to the new physical partition .

US8190610B2
CLAIM 36
. The map-reduce method of claim 35 , wherein all of the key-value pairs of the first and second intermediate data set (said system) are provided to all of the reducers .
US20050210082A1
CLAIM 32
. A system for balancing load on a federation of servers , said system (data set, first data set, second data set) comprising : means for determining a quantity of logical partitions to be moved to the new physical partition and selecting the specific logical partitions to be moved to the new physical partition ;
means for creating a replication stream for selected logical partitions to the new physical partition ;
means for copying selected logical partitions to the new physical partition ;
means for deconflicting inconsistencies in the logical partitions on the new physical partition ;
means for redirecting reads and writes for the selected logical partitions to the logical partitions in the new physical partition ;
and means for removing the original selected logical partitions that were copied to the new physical partition .

US8190610B2
CLAIM 40
. A computer system including a plurality of computing devices , the computer system configured to process data of a data set (said system) , wherein the data set comprises a plurality of data groups , the computer system comprises at least one processor and memory that are operable to perform the following operations : for a first data set having a plurality of first key-value pairs , wherein such first data set belongs to a first data group and the first key-value pairs have a first schema , partitioning the first data set into a plurality of data partitions and providing each data partition to a selected one of a plurality of mapping functions that are each user-configurable to independently output a plurality of lists of values for each of a set of keys found in such map function' ;

s corresponding data partition to form a first intermediate data set having a first set of resulting key-value pairs ;

for a second data set having a plurality of second key-value pairs , wherein such second data set belongs to a second data group and the second key-value pairs have a second schema , partitioning the first data set into a plurality of data partitions and providing each data partition to a selected one of a plurality of mapping functions to form a second intermediate data set having a second set of resulting key-value pairs differing from the first set of resulting key-value pairs , wherein the first schema differs from the second schema ;

and reducing the first and the second intermediate data set together so as to form an output data set , wherein the reducing is accomplished by iterating on at least one key from the first and second intermediate data set , and the output data set has a different schema than the first and second schema , wherein the first and second schema and the first and second set of resulting key-value pairs have a key in common .
US20050210082A1
CLAIM 32
. A system for balancing load on a federation of servers , said system (data set, first data set, second data set) comprising : means for determining a quantity of logical partitions to be moved to the new physical partition and selecting the specific logical partitions to be moved to the new physical partition ;
means for creating a replication stream for selected logical partitions to the new physical partition ;
means for copying selected logical partitions to the new physical partition ;
means for deconflicting inconsistencies in the logical partitions on the new physical partition ;
means for redirecting reads and writes for the selected logical partitions to the logical partitions in the new physical partition ;
and means for removing the original selected logical partitions that were copied to the new physical partition .

US8190610B2
CLAIM 41
. The computer system of claim 40 , wherein the reducing is accomplished by iterating on at least one key from each of the first and second intermediate data set (said system) so that the output data set is a merging of a portion of the first and second intermediate data set .
US20050210082A1
CLAIM 32
. A system for balancing load on a federation of servers , said system (data set, first data set, second data set) comprising : means for determining a quantity of logical partitions to be moved to the new physical partition and selecting the specific logical partitions to be moved to the new physical partition ;
means for creating a replication stream for selected logical partitions to the new physical partition ;
means for copying selected logical partitions to the new physical partition ;
means for deconflicting inconsistencies in the logical partitions on the new physical partition ;
means for redirecting reads and writes for the selected logical partitions to the logical partitions in the new physical partition ;
and means for removing the original selected logical partitions that were copied to the new physical partition .

US8190610B2
CLAIM 42
. The computer system of claim 40 , wherein the reducing is accomplished by a plurality of reducers , the d the at least one processor and memory further operable for : partitioning the first or second intermediate data set (said system) into a plurality of partitions , at least some of the key-value pairs of the first or second intermediate data being provided to more than one of the partitions ;

and providing the partitioned intermediate data set of each partition to a separate one of the reducers .
US20050210082A1
CLAIM 32
. A system for balancing load on a federation of servers , said system (data set, first data set, second data set) comprising : means for determining a quantity of logical partitions to be moved to the new physical partition and selecting the specific logical partitions to be moved to the new physical partition ;
means for creating a replication stream for selected logical partitions to the new physical partition ;
means for copying selected logical partitions to the new physical partition ;
means for deconflicting inconsistencies in the logical partitions on the new physical partition ;
means for redirecting reads and writes for the selected logical partitions to the logical partitions in the new physical partition ;
and means for removing the original selected logical partitions that were copied to the new physical partition .

US8190610B2
CLAIM 43
. The computer system of claim 42 , wherein all of the key-value pairs of the first and second intermediate data set (said system) are provided to all of the reducers .
US20050210082A1
CLAIM 32
. A system for balancing load on a federation of servers , said system (data set, first data set, second data set) comprising : means for determining a quantity of logical partitions to be moved to the new physical partition and selecting the specific logical partitions to be moved to the new physical partition ;
means for creating a replication stream for selected logical partitions to the new physical partition ;
means for copying selected logical partitions to the new physical partition ;
means for deconflicting inconsistencies in the logical partitions on the new physical partition ;
means for redirecting reads and writes for the selected logical partitions to the logical partitions in the new physical partition ;
and means for removing the original selected logical partitions that were copied to the new physical partition .




US8190610B2

Filed: 2006-10-05     Issued: 2012-05-29

MapReduce for distributed database processing

(Original Assignee) Yahoo Inc     (Current Assignee) R2 Solutions LLC ; Altaba Inc

Ali Dasdan, Hung-Chih Yang, Ruey-Lung Hsiao
US20060095481A1

Filed: 2004-11-04     Issued: 2006-05-04

Method and system for partition level cleanup of replication conflict metadata

(Original Assignee) Microsoft Corp     (Current Assignee) Microsoft Technology Licensing LLC

Ram Singh, Philip Vaughn
US8190610B2
CLAIM 1
. A method of processing data of a data set over a distributed system , wherein the data set comprises a plurality of data groups , the method comprising : partitioning the data of each one of the data groups into a plurality of data partitions that each have a plurality of key-value pairs and providing each data partition to a selected one of a plurality of mapping functions that are each user-configurable to independently output a plurality of lists of values for each of a set of keys found in such map function' ;

s corresponding data partition to form corresponding intermediate data for that data group (rising time) and identifiable to that data group , wherein the data of a first data group has a different schema than the data of a second data group and the data of the first data group is mapped differently than the data of the second data group so that different lists (rising time) of values are output for the corresponding different intermediate data (rising time) , wherein the different schema and corresponding different intermediate data have a key in common ;

and reducing the intermediate data for the data groups to at least one output data group , including processing the intermediate data for each data group in a manner that is defined to correspond to that data group , so as to result in a merging of the corresponding different intermediate data based on the key in common , wherein the mapping and reducing operations (executing instructions) are performed by a distributed system .
US20060095481A1
CLAIM 3
. The method of claim 1 , further comprising time (different lists, different intermediate data, data group, second data group, corresponding different intermediate data) tagging the generation of each change to database data made by a node .

US20060095481A1
CLAIM 5
. A system for replicating data which includes a metadata management , the system comprising : a database ;
at least two nodes , the nodes capable of downloading partitions of the database data ;
means for executing instructions (reducing operations) , the instructions performing a method comprising : establishing a metadata retention period for nodes in the system , wherein once the metadata retention period expires for a node , metadata concerning conflict resolution for the node is discarded ;
associating data changes with nodes , wherein a change to a portion of database data made by the node is mapped to other interested nodes , such that the node and the other interested nodes can receive updated changes to the portion of database data by generating a change-to-node mapping ;
removing conflict resolution metadata corresponding to the node if a metadata retention period for the node has expired and other interested nodes are included in the association of data changes ;
and avoiding the removal of conflict resolution metadata in any node for which other interested nodes are absent .

US8190610B2
CLAIM 2
. The method of claim 1 , wherein : the at least one output data group (rising time) is a plurality of output data groups .
US20060095481A1
CLAIM 3
. The method of claim 1 , further comprising time (different lists, different intermediate data, data group, second data group, corresponding different intermediate data) tagging the generation of each change to database data made by a node .

US8190610B2
CLAIM 3
. The method of claim 1 , wherein : corresponding intermediate data for a data group (rising time) being identifiable to the data group includes the corresponding intermediate data being stored such that metadata associated with a manner of storage is identifiable to the data group .
US20060095481A1
CLAIM 3
. The method of claim 1 , further comprising time (different lists, different intermediate data, data group, second data group, corresponding different intermediate data) tagging the generation of each change to database data made by a node .

US8190610B2
CLAIM 4
. The method of claim 1 , wherein : corresponding intermediate data for a data group (rising time) being identifiable to the data group includes the corresponding intermediate data being stored in association with data that is identifiable to the data group .
US20060095481A1
CLAIM 3
. The method of claim 1 , further comprising time (different lists, different intermediate data, data group, second data group, corresponding different intermediate data) tagging the generation of each change to database data made by a node .

US8190610B2
CLAIM 5
. The method of claim 1 , wherein : processing the intermediate data for each data group (rising time) in a manner that is defined to correspond to that data group includes , for each data group , employing an iterator that corresponds to that data group .
US20060095481A1
CLAIM 3
. The method of claim 1 , further comprising time (different lists, different intermediate data, data group, second data group, corresponding different intermediate data) tagging the generation of each change to database data made by a node .

US8190610B2
CLAIM 11
. The method of claim 10 , wherein : processing the intermediate data for each data group (rising time) in a manner that is defined to correspond to that data group includes , for each data group , employing an iterator that corresponds to that data group , wherein the iterator includes providing the associated metadata to the processing of the reducing step .
US20060095481A1
CLAIM 3
. The method of claim 1 , further comprising time (different lists, different intermediate data, data group, second data group, corresponding different intermediate data) tagging the generation of each change to database data made by a node .

US8190610B2
CLAIM 12
. The method of claim 5 , wherein : the intermediate data includes a plurality of grouped sets of key/value pairs ;

the reducing step is carried out by a plurality of reducers ;

for at least one of the reducers , the iterator corresponding to a particular data group (rising time) , for that reducer , operates according to a different key of a different schema than the iterator corresponding to another particular data group , for that reducer .
US20060095481A1
CLAIM 3
. The method of claim 1 , further comprising time (different lists, different intermediate data, data group, second data group, corresponding different intermediate data) tagging the generation of each change to database data made by a node .

US8190610B2
CLAIM 17
. A computer system including a plurality of computing devices , the computer system configured to process data of a data set , wherein the data set comprises a plurality of data groups , the computer system comprises at least one processor and memory that are operable to perform the following operations : partitioning the data of each one of the data groups into a plurality of data partitions that each have a plurality of key-value pairs and providing each data partition to a selected one of a plurality of mapping functions that are each user-configurable to independently output a plurality of lists of values for each of a set of keys found in such map function' ;

s corresponding data partition to form corresponding intermediate data for that data group (rising time) and identifiable to that data group , wherein the data of a first data group has a different schema than the data of a second data group and the data of the first data group is mapped differently than the data of the second data group so that different lists (rising time) of values are output for the corresponding different intermediate data (rising time) , wherein the different schema and corresponding different intermediate data have a key in common ;

and reduce the intermediate data for the data groups to at least one output data group , including processing the intermediate data for each data group in a manner that is defined to correspond to that data group so as to result in a merging of the corresponding different intermediate data based on the key in common .
US20060095481A1
CLAIM 3
. The method of claim 1 , further comprising time (different lists, different intermediate data, data group, second data group, corresponding different intermediate data) tagging the generation of each change to database data made by a node .

US8190610B2
CLAIM 18
. The computer system of claim 17 , wherein : the at least one output data group (rising time) is a plurality of output data groups .
US20060095481A1
CLAIM 3
. The method of claim 1 , further comprising time (different lists, different intermediate data, data group, second data group, corresponding different intermediate data) tagging the generation of each change to database data made by a node .

US8190610B2
CLAIM 19
. The computer system of claim 17 , wherein : corresponding intermediate data for a data group (rising time) being identifiable to the data group includes the corresponding intermediate data being stored such that metadata associated with the manner of storage is identifiable to the data group .
US20060095481A1
CLAIM 3
. The method of claim 1 , further comprising time (different lists, different intermediate data, data group, second data group, corresponding different intermediate data) tagging the generation of each change to database data made by a node .

US8190610B2
CLAIM 20
. The computer system of claim 19 , wherein : corresponding intermediate data for a data group (rising time) being identifiable to the data group includes the computer system being configured to store the corresponding intermediate data in association with data that is identifiable to the data group .
US20060095481A1
CLAIM 3
. The method of claim 1 , further comprising time (different lists, different intermediate data, data group, second data group, corresponding different intermediate data) tagging the generation of each change to database data made by a node .

US8190610B2
CLAIM 21
. The computer system of claim 17 , wherein : the at least one processor and memory being configured to process the intermediate data for each data group (rising time) in a manner that is defined to correspond to that data group includes , for each data group , the at least one processor and memory being configured to employ an iterator that corresponds to that data group .
US20060095481A1
CLAIM 3
. The method of claim 1 , further comprising time (different lists, different intermediate data, data group, second data group, corresponding different intermediate data) tagging the generation of each change to database data made by a node .

US8190610B2
CLAIM 22
. The computer system of claim 21 , wherein : the intermediate data includes a plurality of grouped sets of key/value pairs ;

the at least one processor and memory are further operable to reduce the intermediate data via a plurality of reducers ;

for at least one of the reducers , the iterator corresponding to a particular data group (rising time) , for that reducer , is configured to operate according to a different key of a different schema than the iterator corresponding to another particular data group , for that reducer .
US20060095481A1
CLAIM 3
. The method of claim 1 , further comprising time (different lists, different intermediate data, data group, second data group, corresponding different intermediate data) tagging the generation of each change to database data made by a node .

US8190610B2
CLAIM 28
. The computer system of claim 27 , wherein : the at least one processor and memory being further operable to process the intermediate data for each data group (rising time) in a manner that is defined to correspond to that data group includes , for each data group , the at least one processor and memory being further operable to employ an iterator that corresponds to that data group , wherein the iterator is configured to provide the associated metadata to the reducing .
US20060095481A1
CLAIM 3
. The method of claim 1 , further comprising time (different lists, different intermediate data, data group, second data group, corresponding different intermediate data) tagging the generation of each change to database data made by a node .

US8190610B2
CLAIM 33
. A map-reduce method of processing data from a plurality of groups having different schema over a computer system , the method comprising : for a first data (time t) set having a plurality of first key-value pairs , wherein such first data set belongs to a first data group (rising time) and the first key-value pairs have a first schema , partitioning the first data set into a plurality of data partitions and providing each data partition to a selected one of a plurality of mapping functions that are each user-configurable to independently output a plurality of lists of values for each of a set of keys found in such map function' ;

s corresponding data partition to form a first intermediate data set having a first set of resulting key-value pairs ;

for a second data set having a plurality of second key-value pairs , wherein such second data set belongs to a second data group and the second key-value pairs have a second schema , partitioning the first data set into a plurality of data partitions and providing each data partition to a selected one of a plurality of mapping functions to form a second intermediate data set having a second set of resulting key-value pairs differing from the first set of resulting key-value pairs , wherein the first schema differs from the second schema , wherein the first and second schema and the first and second set of resulting key-value pairs have a key in common ;

and reducing the first and the second intermediate data set together so as to form an output data set , wherein the reducing is accomplished by iterating on at least one key from the first and second intermediate data set , and the output data set has a different schema than the first and second schema , wherein the mapping and reducing operations (executing instructions) are performed by a distributed system .
US20060095481A1
CLAIM 3
. The method of claim 1 , further comprising time (different lists, different intermediate data, data group, second data group, corresponding different intermediate data) tagging the generation of each change to database data made by a node .

US20060095481A1
CLAIM 5
. A system for replicating data which includes a metadata management , the system comprising : a database ;
at least two nodes , the nodes capable of downloading partitions of the database data ;
means for executing instructions (reducing operations) , the instructions performing a method comprising : establishing a metadata retention period for nodes in the system , wherein once the metadata retention period expires for a node , metadata concerning conflict resolution for the node is discarded ;
associating data changes with nodes , wherein a change to a portion of database data made by the node is mapped to other interested nodes , such that the node and the other interested nodes can receive updated changes to the portion of database data by generating a change-to-node mapping ;
removing conflict resolution metadata corresponding to the node if a metadata retention period for the node has expired and other interested nodes are included in the association of data changes ;
and avoiding the removal of conflict resolution metadata in any node for which other interested nodes are absent .

US8190610B2
CLAIM 40
. A computer system including a plurality of computing devices , the computer system configured to process data of a data set , wherein the data set comprises a plurality of data groups , the computer system comprises at least one processor and memory that are operable to perform the following operations : for a first data (time t) set having a plurality of first key-value pairs , wherein such first data set belongs to a first data group (rising time) and the first key-value pairs have a first schema , partitioning the first data set into a plurality of data partitions and providing each data partition to a selected one of a plurality of mapping functions that are each user-configurable to independently output a plurality of lists of values for each of a set of keys found in such map function' ;

s corresponding data partition to form a first intermediate data set having a first set of resulting key-value pairs ;

for a second data set having a plurality of second key-value pairs , wherein such second data set belongs to a second data group and the second key-value pairs have a second schema , partitioning the first data set into a plurality of data partitions and providing each data partition to a selected one of a plurality of mapping functions to form a second intermediate data set having a second set of resulting key-value pairs differing from the first set of resulting key-value pairs , wherein the first schema differs from the second schema ;

and reducing the first and the second intermediate data set together so as to form an output data set , wherein the reducing is accomplished by iterating on at least one key from the first and second intermediate data set , and the output data set has a different schema than the first and second schema , wherein the first and second schema and the first and second set of resulting key-value pairs have a key in common .
US20060095481A1
CLAIM 3
. The method of claim 1 , further comprising time (different lists, different intermediate data, data group, second data group, corresponding different intermediate data) tagging the generation of each change to database data made by a node .




US8190610B2

Filed: 2006-10-05     Issued: 2012-05-29

MapReduce for distributed database processing

(Original Assignee) Yahoo Inc     (Current Assignee) R2 Solutions LLC ; Altaba Inc

Ali Dasdan, Hung-Chih Yang, Ruey-Lung Hsiao
US20050222980A1

Filed: 2004-03-31     Issued: 2005-10-06

Fragment elimination

(Original Assignee) International Business Machines Corp     (Current Assignee) International Business Machines Corp

Evan Lee
US8190610B2
CLAIM 1
. A method of processing data of a data set over a distributed system , wherein the data set comprises a plurality of data groups , the method comprising : partitioning the data of each one of the data groups into a plurality of data partitions that each have a plurality of key-value pairs and providing each data partition to a selected one of a plurality of mapping functions that are each user-configurable to independently output a plurality of lists of values for each of a set of keys found in such map function' ;

s corresponding data partition to form corresponding intermediate data for that data group and identifiable to that data group , wherein the data of a first data (first data) group has a different schema than the data of a second data group and the data of the first data group is mapped differently than the data of the second data group so that different lists (coding program) of values are output for the corresponding different intermediate data (coding program) , wherein the different schema and corresponding different intermediate data have a key in common ;

and reducing the intermediate data for the data groups to at least one output data group , including processing the intermediate data for each data group in a manner that is defined to correspond to that data group , so as to result in a merging of the corresponding different intermediate data based on the key in common , wherein the mapping and reducing operations are performed by a distributed system .
US20050222980A1
CLAIM 16
. The fragmented database as set forth in claim 14 , wherein the one or more fragmentation dimension basis functions comprise : a first fragmentation dimension basis function depending upon at least a first data (first data) base field ;
and a second fragmentation dimension basis function depending upon at least the first database field .

US20050222980A1
CLAIM 19
. A storage medium encoding program (different lists, different intermediate data) code for performing database functions , the program code comprising : program code for constructing a fragmented database having a fragmentation scheme constructed using fragmentation dimension basis functions , each fragmentation dimension basis function depending upon at least one database field ;
and program code for inserting a new record into the fragmented database , the inserting including (i) computing values of the fragmentation dimension basis functions using fields of the new record , (ii) selecting a target database fragment based on the fragmentation scheme and the computed values of the fragmentation dimension basis functions , and (iii) inserting the new record into the target database fragment .

US8190610B2
CLAIM 17
. A computer system including a plurality of computing devices , the computer system configured to process data of a data set , wherein the data set comprises a plurality of data groups , the computer system comprises at least one processor and memory that are operable to perform the following operations : partitioning the data of each one of the data groups into a plurality of data partitions that each have a plurality of key-value pairs and providing each data partition to a selected one of a plurality of mapping functions that are each user-configurable to independently output a plurality of lists of values for each of a set of keys found in such map function' ;

s corresponding data partition to form corresponding intermediate data for that data group and identifiable to that data group , wherein the data of a first data (first data) group has a different schema than the data of a second data group and the data of the first data group is mapped differently than the data of the second data group so that different lists (coding program) of values are output for the corresponding different intermediate data (coding program) , wherein the different schema and corresponding different intermediate data have a key in common ;

and reduce the intermediate data for the data groups to at least one output data group , including processing the intermediate data for each data group in a manner that is defined to correspond to that data group so as to result in a merging of the corresponding different intermediate data based on the key in common .
US20050222980A1
CLAIM 16
. The fragmented database as set forth in claim 14 , wherein the one or more fragmentation dimension basis functions comprise : a first fragmentation dimension basis function depending upon at least a first data (first data) base field ;
and a second fragmentation dimension basis function depending upon at least the first database field .

US20050222980A1
CLAIM 19
. A storage medium encoding program (different lists, different intermediate data) code for performing database functions , the program code comprising : program code for constructing a fragmented database having a fragmentation scheme constructed using fragmentation dimension basis functions , each fragmentation dimension basis function depending upon at least one database field ;
and program code for inserting a new record into the fragmented database , the inserting including (i) computing values of the fragmentation dimension basis functions using fields of the new record , (ii) selecting a target database fragment based on the fragmentation scheme and the computed values of the fragmentation dimension basis functions , and (iii) inserting the new record into the target database fragment .

US8190610B2
CLAIM 33
. A map-reduce method of processing data from a plurality of groups having different schema over a computer system , the method comprising : for a first data (first data) set having a plurality of first key-value pairs , wherein such first data set belongs to a first data group and the first key-value pairs have a first schema , partitioning the first data set into a plurality of data partitions and providing each data partition to a selected one of a plurality of mapping functions that are each user-configurable to independently output a plurality of lists of values for each of a set of keys found in such map function' ;

s corresponding data partition to form a first intermediate data set having a first set of resulting key-value pairs ;

for a second data set having a plurality of second key-value pairs , wherein such second data set belongs to a second data group and the second key-value pairs have a second schema , partitioning the first data set into a plurality of data partitions and providing each data partition to a selected one of a plurality of mapping functions to form a second intermediate data set having a second set of resulting key-value pairs differing from the first set of resulting key-value pairs , wherein the first schema differs from the second schema , wherein the first and second schema and the first and second set of resulting key-value pairs have a key in common ;

and reducing the first and the second intermediate data set together so as to form an output data set (n value) , wherein the reducing is accomplished by iterating on at least one key from the first and second intermediate data set , and the output data set has a different schema than the first and second schema , wherein the mapping and reducing operations are performed by a distributed system .
US20050222980A1
CLAIM 13
. The method as set forth in claim 1 , wherein the processing of a database query comprises : recognizing the query as a row insert or row update operation including a plurality of new record fields corresponding to database fields of the database ;
computing fragmentation dimension value (output data set) s corresponding to the fragmentation dimension basis functions using the new record fields as inputs ;
inserting or updating using the new record fields in an identified one of the database fragments whose corresponding fragmentation expression is satisfied by the computed fragmentation dimension values .

US20050222980A1
CLAIM 16
. The fragmented database as set forth in claim 14 , wherein the one or more fragmentation dimension basis functions comprise : a first fragmentation dimension basis function depending upon at least a first data (first data) base field ;
and a second fragmentation dimension basis function depending upon at least the first database field .

US8190610B2
CLAIM 34
. The map-reduce method of claim 33 , wherein the reducing is accomplished by iterating on at least one key from each of the first and second intermediate data set so that the output data set (n value) is a merging of a portion of the first and second intermediate data set .
US20050222980A1
CLAIM 13
. The method as set forth in claim 1 , wherein the processing of a database query comprises : recognizing the query as a row insert or row update operation including a plurality of new record fields corresponding to database fields of the database ;
computing fragmentation dimension value (output data set) s corresponding to the fragmentation dimension basis functions using the new record fields as inputs ;
inserting or updating using the new record fields in an identified one of the database fragments whose corresponding fragmentation expression is satisfied by the computed fragmentation dimension values .

US8190610B2
CLAIM 40
. A computer system including a plurality of computing devices , the computer system configured to process data of a data set , wherein the data set comprises a plurality of data groups , the computer system comprises at least one processor and memory that are operable to perform the following operations : for a first data (first data) set having a plurality of first key-value pairs , wherein such first data set belongs to a first data group and the first key-value pairs have a first schema , partitioning the first data set into a plurality of data partitions and providing each data partition to a selected one of a plurality of mapping functions that are each user-configurable to independently output a plurality of lists of values for each of a set of keys found in such map function' ;

s corresponding data partition to form a first intermediate data set having a first set of resulting key-value pairs ;

for a second data set having a plurality of second key-value pairs , wherein such second data set belongs to a second data group and the second key-value pairs have a second schema , partitioning the first data set into a plurality of data partitions and providing each data partition to a selected one of a plurality of mapping functions to form a second intermediate data set having a second set of resulting key-value pairs differing from the first set of resulting key-value pairs , wherein the first schema differs from the second schema ;

and reducing the first and the second intermediate data set together so as to form an output data set (n value) , wherein the reducing is accomplished by iterating on at least one key from the first and second intermediate data set , and the output data set has a different schema than the first and second schema , wherein the first and second schema and the first and second set of resulting key-value pairs have a key in common .
US20050222980A1
CLAIM 13
. The method as set forth in claim 1 , wherein the processing of a database query comprises : recognizing the query as a row insert or row update operation including a plurality of new record fields corresponding to database fields of the database ;
computing fragmentation dimension value (output data set) s corresponding to the fragmentation dimension basis functions using the new record fields as inputs ;
inserting or updating using the new record fields in an identified one of the database fragments whose corresponding fragmentation expression is satisfied by the computed fragmentation dimension values .

US20050222980A1
CLAIM 16
. The fragmented database as set forth in claim 14 , wherein the one or more fragmentation dimension basis functions comprise : a first fragmentation dimension basis function depending upon at least a first data (first data) base field ;
and a second fragmentation dimension basis function depending upon at least the first database field .

US8190610B2
CLAIM 41
. The computer system of claim 40 , wherein the reducing is accomplished by iterating on at least one key from each of the first and second intermediate data set so that the output data set (n value) is a merging of a portion of the first and second intermediate data set .
US20050222980A1
CLAIM 13
. The method as set forth in claim 1 , wherein the processing of a database query comprises : recognizing the query as a row insert or row update operation including a plurality of new record fields corresponding to database fields of the database ;
computing fragmentation dimension value (output data set) s corresponding to the fragmentation dimension basis functions using the new record fields as inputs ;
inserting or updating using the new record fields in an identified one of the database fragments whose corresponding fragmentation expression is satisfied by the computed fragmentation dimension values .




US8190610B2

Filed: 2006-10-05     Issued: 2012-05-29

MapReduce for distributed database processing

(Original Assignee) Yahoo Inc     (Current Assignee) R2 Solutions LLC ; Altaba Inc

Ali Dasdan, Hung-Chih Yang, Ruey-Lung Hsiao
US20050015546A1

Filed: 2004-03-24     Issued: 2005-01-20

Data storage system

(Original Assignee) XIV Ltd     (Current Assignee) International Business Machines Corp

Ofir Zohar, Yaron Revah, Haim Helman, Dror Cohen, Shemer Schwartz
US8190610B2
CLAIM 1
. A method of processing data of a data set over a distributed system , wherein the data set comprises a plurality of data groups , the method comprising : partitioning the data of each one of the data groups into a plurality of data partitions that each have a plurality of key-value pairs and providing each data partition to a selected one of a plurality of mapping functions (more interface) that are each user-configurable to independently output a plurality of lists of values for each of a set of keys found in such map function' ;

s corresponding data partition to form corresponding intermediate data for that data group and identifiable to that data group , wherein the data of a first data group has a different schema (receiving input) than the data of a second data group and the data of the first data group is mapped differently than the data of the second data group so that different lists of values are output for the corresponding different intermediate data , wherein the different schema and corresponding different intermediate data have a key in common ;

and reducing the intermediate data for the data groups to at least one output data group , including processing the intermediate data for each data group in a manner that is defined to correspond to that data group , so as to result in a merging of the corresponding different intermediate data based on the key in common , wherein the mapping and reducing operations are performed by a distributed system .
US20050015546A1
CLAIM 16
. A data storage system , comprising : one or more mass-storage devices , coupled to store partitions of data at respective first ranges of logical addresses (LAs) ;
a plurality of interim devices , configured to operate independently of one another , each interim device being assigned a respective second range of the LAs and coupled to receive the partitions of data from and provide the partitions of data to the one or more mass-storage devices having LAs within the respective second range ;
and one or more interface (mapping functions) s , which are adapted to receive input/output (IO) requests from host processors , to identify specified partitions of data in response to the IO requests , to convert the IO requests to converted-IO-requests directed to specified LAs in response to the specified partitions of data , and to direct all the converted-IO-requests to the interim device to which the specified LAs are assigned .

US20050015546A1
CLAIM 35
. A method for storing data , comprising : coupling one or more mass-storage devices to store partitions of data at respective first ranges of logical addresses (LAs) ;
configuring a plurality of interim devices to operate independently of one another ;
assigning each interim device a respective second range of the LAs ;
coupling each interim device to receive the partitions of data from and provide the partitions of data to the one or more mass-storage devices having LAs within the respective second range ;
receiving input (different schema) /output (IO) requests from host processors ;
identifying specified partitions of data in response to the IO requests ;
converting the IO requests to converted-IO-requests directed to specified LAs in response to the specified partitions of data ;
and directing all the converted-IO-requests to the interim device to which the specified LAs are assigned .

US8190610B2
CLAIM 12
. The method of claim 5 , wherein : the intermediate data includes a plurality of grouped sets of key/value pairs ;

the reducing step is carried out by a plurality of reducers ;

for at least one of the reducers , the iterator corresponding to a particular data group , for that reducer , operates according to a different key of a different schema (receiving input) than the iterator corresponding to another particular data group , for that reducer .
US20050015546A1
CLAIM 35
. A method for storing data , comprising : coupling one or more mass-storage devices to store partitions of data at respective first ranges of logical addresses (LAs) ;
configuring a plurality of interim devices to operate independently of one another ;
assigning each interim device a respective second range of the LAs ;
coupling each interim device to receive the partitions of data from and provide the partitions of data to the one or more mass-storage devices having LAs within the respective second range ;
receiving input (different schema) /output (IO) requests from host processors ;
identifying specified partitions of data in response to the IO requests ;
converting the IO requests to converted-IO-requests directed to specified LAs in response to the specified partitions of data ;
and directing all the converted-IO-requests to the interim device to which the specified LAs are assigned .

US8190610B2
CLAIM 17
. A computer system including a plurality of computing devices , the computer system configured to process data of a data set , wherein the data set comprises a plurality of data groups , the computer system comprises at least one processor and memory that are operable to perform the following operations : partitioning the data of each one of the data groups into a plurality of data partitions that each have a plurality of key-value pairs and providing each data partition to a selected one of a plurality of mapping functions (more interface) that are each user-configurable to independently output a plurality of lists of values for each of a set of keys found in such map function' ;

s corresponding data partition to form corresponding intermediate data for that data group and identifiable to that data group , wherein the data of a first data group has a different schema (receiving input) than the data of a second data group and the data of the first data group is mapped differently than the data of the second data group so that different lists of values are output for the corresponding different intermediate data , wherein the different schema and corresponding different intermediate data have a key in common ;

and reduce the intermediate data for the data groups to at least one output data group , including processing the intermediate data for each data group in a manner that is defined to correspond to that data group so as to result in a merging of the corresponding different intermediate data based on the key in common .
US20050015546A1
CLAIM 16
. A data storage system , comprising : one or more mass-storage devices , coupled to store partitions of data at respective first ranges of logical addresses (LAs) ;
a plurality of interim devices , configured to operate independently of one another , each interim device being assigned a respective second range of the LAs and coupled to receive the partitions of data from and provide the partitions of data to the one or more mass-storage devices having LAs within the respective second range ;
and one or more interface (mapping functions) s , which are adapted to receive input/output (IO) requests from host processors , to identify specified partitions of data in response to the IO requests , to convert the IO requests to converted-IO-requests directed to specified LAs in response to the specified partitions of data , and to direct all the converted-IO-requests to the interim device to which the specified LAs are assigned .

US20050015546A1
CLAIM 35
. A method for storing data , comprising : coupling one or more mass-storage devices to store partitions of data at respective first ranges of logical addresses (LAs) ;
configuring a plurality of interim devices to operate independently of one another ;
assigning each interim device a respective second range of the LAs ;
coupling each interim device to receive the partitions of data from and provide the partitions of data to the one or more mass-storage devices having LAs within the respective second range ;
receiving input (different schema) /output (IO) requests from host processors ;
identifying specified partitions of data in response to the IO requests ;
converting the IO requests to converted-IO-requests directed to specified LAs in response to the specified partitions of data ;
and directing all the converted-IO-requests to the interim device to which the specified LAs are assigned .

US8190610B2
CLAIM 22
. The computer system of claim 21 , wherein : the intermediate data includes a plurality of grouped sets of key/value pairs ;

the at least one processor and memory are further operable to reduce the intermediate data via a plurality of reducers ;

for at least one of the reducers , the iterator corresponding to a particular data group , for that reducer , is configured to operate according to a different key of a different schema (receiving input) than the iterator corresponding to another particular data group , for that reducer .
US20050015546A1
CLAIM 35
. A method for storing data , comprising : coupling one or more mass-storage devices to store partitions of data at respective first ranges of logical addresses (LAs) ;
configuring a plurality of interim devices to operate independently of one another ;
assigning each interim device a respective second range of the LAs ;
coupling each interim device to receive the partitions of data from and provide the partitions of data to the one or more mass-storage devices having LAs within the respective second range ;
receiving input (different schema) /output (IO) requests from host processors ;
identifying specified partitions of data in response to the IO requests ;
converting the IO requests to converted-IO-requests directed to specified LAs in response to the specified partitions of data ;
and directing all the converted-IO-requests to the interim device to which the specified LAs are assigned .

US8190610B2
CLAIM 33
. A map-reduce method of processing data from a plurality of groups having different schema (receiving input) over a computer system , the method comprising : for a first data set having a plurality of first key-value pairs , wherein such first data set belongs to a first data group and the first key-value pairs have a first schema , partitioning the first data set into a plurality of data partitions and providing each data partition to a selected one of a plurality of mapping functions (more interface) that are each user-configurable to independently output a plurality of lists of values for each of a set of keys found in such map function' ;

s corresponding data partition to form a first intermediate data set having a first set (group number) of resulting key-value pairs ;

for a second data set having a plurality of second key-value pairs , wherein such second data set belongs to a second data group and the second key-value pairs have a second schema , partitioning the first data set into a plurality of data partitions and providing each data partition to a selected one of a plurality of mapping functions to form a second intermediate data set having a second set of resulting key-value pairs differing from the first set of resulting key-value pairs , wherein the first schema differs from the second schema , wherein the first and second schema and the first and second set of resulting key-value pairs have a key in common ;

and reducing the first and the second intermediate data set together so as to form an output data set , wherein the reducing is accomplished by iterating on at least one key from the first and second intermediate data set , and the output data set has a different schema than the first and second schema , wherein the mapping and reducing operations are performed by a distributed system .
US20050015546A1
CLAIM 4
. The method according to claim 1 , wherein the first plurality of groups comprises s groups each having a different integral group number (first set) between 1 and s , wherein the number comprises an integer r randomly chosen from and including integers between 0 and s-1 , wherein the sequential partition number comprises a positive integer p , and wherein the group number of the assigned specific group is (r+p)modulo(s) if (r+p)modulo(s)≠0 , and s if (r+p)modulo(s)=0 .

US20050015546A1
CLAIM 16
. A data storage system , comprising : one or more mass-storage devices , coupled to store partitions of data at respective first ranges of logical addresses (LAs) ;
a plurality of interim devices , configured to operate independently of one another , each interim device being assigned a respective second range of the LAs and coupled to receive the partitions of data from and provide the partitions of data to the one or more mass-storage devices having LAs within the respective second range ;
and one or more interface (mapping functions) s , which are adapted to receive input/output (IO) requests from host processors , to identify specified partitions of data in response to the IO requests , to convert the IO requests to converted-IO-requests directed to specified LAs in response to the specified partitions of data , and to direct all the converted-IO-requests to the interim device to which the specified LAs are assigned .

US20050015546A1
CLAIM 35
. A method for storing data , comprising : coupling one or more mass-storage devices to store partitions of data at respective first ranges of logical addresses (LAs) ;
configuring a plurality of interim devices to operate independently of one another ;
assigning each interim device a respective second range of the LAs ;
coupling each interim device to receive the partitions of data from and provide the partitions of data to the one or more mass-storage devices having LAs within the respective second range ;
receiving input (different schema) /output (IO) requests from host processors ;
identifying specified partitions of data in response to the IO requests ;
converting the IO requests to converted-IO-requests directed to specified LAs in response to the specified partitions of data ;
and directing all the converted-IO-requests to the interim device to which the specified LAs are assigned .

US8190610B2
CLAIM 40
. A computer system including a plurality of computing devices , the computer system configured to process data of a data set , wherein the data set comprises a plurality of data groups , the computer system comprises at least one processor and memory that are operable to perform the following operations : for a first data set having a plurality of first key-value pairs , wherein such first data set belongs to a first data group and the first key-value pairs have a first schema , partitioning the first data set into a plurality of data partitions and providing each data partition to a selected one of a plurality of mapping functions (more interface) that are each user-configurable to independently output a plurality of lists of values for each of a set of keys found in such map function' ;

s corresponding data partition to form a first intermediate data set having a first set (group number) of resulting key-value pairs ;

for a second data set having a plurality of second key-value pairs , wherein such second data set belongs to a second data group and the second key-value pairs have a second schema , partitioning the first data set into a plurality of data partitions and providing each data partition to a selected one of a plurality of mapping functions to form a second intermediate data set having a second set of resulting key-value pairs differing from the first set of resulting key-value pairs , wherein the first schema differs from the second schema ;

and reducing the first and the second intermediate data set together so as to form an output data set , wherein the reducing is accomplished by iterating on at least one key from the first and second intermediate data set , and the output data set has a different schema (receiving input) than the first and second schema , wherein the first and second schema and the first and second set of resulting key-value pairs have a key in common .
US20050015546A1
CLAIM 4
. The method according to claim 1 , wherein the first plurality of groups comprises s groups each having a different integral group number (first set) between 1 and s , wherein the number comprises an integer r randomly chosen from and including integers between 0 and s-1 , wherein the sequential partition number comprises a positive integer p , and wherein the group number of the assigned specific group is (r+p)modulo(s) if (r+p)modulo(s)≠0 , and s if (r+p)modulo(s)=0 .

US20050015546A1
CLAIM 16
. A data storage system , comprising : one or more mass-storage devices , coupled to store partitions of data at respective first ranges of logical addresses (LAs) ;
a plurality of interim devices , configured to operate independently of one another , each interim device being assigned a respective second range of the LAs and coupled to receive the partitions of data from and provide the partitions of data to the one or more mass-storage devices having LAs within the respective second range ;
and one or more interface (mapping functions) s , which are adapted to receive input/output (IO) requests from host processors , to identify specified partitions of data in response to the IO requests , to convert the IO requests to converted-IO-requests directed to specified LAs in response to the specified partitions of data , and to direct all the converted-IO-requests to the interim device to which the specified LAs are assigned .

US20050015546A1
CLAIM 35
. A method for storing data , comprising : coupling one or more mass-storage devices to store partitions of data at respective first ranges of logical addresses (LAs) ;
configuring a plurality of interim devices to operate independently of one another ;
assigning each interim device a respective second range of the LAs ;
coupling each interim device to receive the partitions of data from and provide the partitions of data to the one or more mass-storage devices having LAs within the respective second range ;
receiving input (different schema) /output (IO) requests from host processors ;
identifying specified partitions of data in response to the IO requests ;
converting the IO requests to converted-IO-requests directed to specified LAs in response to the specified partitions of data ;
and directing all the converted-IO-requests to the interim device to which the specified LAs are assigned .




US8190610B2

Filed: 2006-10-05     Issued: 2012-05-29

MapReduce for distributed database processing

(Original Assignee) Yahoo Inc     (Current Assignee) R2 Solutions LLC ; Altaba Inc

Ali Dasdan, Hung-Chih Yang, Ruey-Lung Hsiao
US20040225638A1

Filed: 2004-02-26     Issued: 2004-11-11

Method and system for data mining in high dimensional data spaces

(Original Assignee) International Business Machines Corp     (Current Assignee) International Business Machines Corp ; Kaon Interactive Inc

Reinhold Geiselhart, Christoph Lingenfelder, Janna Orechkina
US8190610B2
CLAIM 1
. A method of processing data of a data set over a distributed system , wherein the data set comprises a plurality of data groups , the method comprising : partitioning the data of each one of the data groups into a plurality of data partitions that each have a plurality of key-value pairs and providing each data partition to a selected one of a plurality of mapping functions that are each user-configurable to independently output a plurality of lists of values for each of a set of keys found in such map function' ;

s corresponding data partition to form corresponding intermediate data for that data group and identifiable to that data group , wherein the data of a first data group has a different schema than the data of a second data group and the data of the first data group is mapped differently than the data of the second data group so that different lists (data processing program) of values are output for the corresponding different intermediate data , wherein the different schema and corresponding different intermediate data have a key in common ;

and reducing the intermediate data for the data groups to at least one output data group , including processing the intermediate data for each data group in a manner that is defined to correspond to that data group , so as to result in a merging of the corresponding different intermediate data based on the key in common , wherein the mapping and reducing operations are performed by a distributed system .
US20040225638A1
CLAIM 13
. A data processing program (different lists) for execution in a data processing system comprising software code portions for performing a method according to claim 1 , when said program is run on said data processing system .

US8190610B2
CLAIM 17
. A computer system (computer system) including a plurality of computing devices , the computer system configured to process data of a data set , wherein the data set comprises a plurality of data groups , the computer system comprises at least one processor and memory that are operable to perform the following operations : partitioning the data of each one of the data groups into a plurality of data partitions that each have a plurality of key-value pairs and providing each data partition to a selected one of a plurality of mapping functions that are each user-configurable to independently output a plurality of lists of values for each of a set of keys found in such map function' ;

s corresponding data partition to form corresponding intermediate data for that data group and identifiable to that data group , wherein the data of a first data group has a different schema than the data of a second data group and the data of the first data group is mapped differently than the data of the second data group so that different lists (data processing program) of values are output for the corresponding different intermediate data , wherein the different schema and corresponding different intermediate data have a key in common ;

and reduce the intermediate data for the data groups to at least one output data group , including processing the intermediate data for each data group in a manner that is defined to correspond to that data group so as to result in a merging of the corresponding different intermediate data based on the key in common .
US20040225638A1
CLAIM 12
. A computer system (computer system) comprising means adapted for carrying out the steps of the method according to claim 1 .

US20040225638A1
CLAIM 13
. A data processing program (different lists) for execution in a data processing system comprising software code portions for performing a method according to claim 1 , when said program is run on said data processing system .

US8190610B2
CLAIM 18
. The computer system (computer system) of claim 17 , wherein : the at least one output data group is a plurality of output data groups .
US20040225638A1
CLAIM 12
. A computer system (computer system) comprising means adapted for carrying out the steps of the method according to claim 1 .

US8190610B2
CLAIM 19
. The computer system (computer system) of claim 17 , wherein : corresponding intermediate data for a data group being identifiable to the data group includes the corresponding intermediate data being stored such that metadata associated with the manner of storage is identifiable to the data group .
US20040225638A1
CLAIM 12
. A computer system (computer system) comprising means adapted for carrying out the steps of the method according to claim 1 .

US8190610B2
CLAIM 20
. The computer system (computer system) of claim 19 , wherein : corresponding intermediate data for a data group being identifiable to the data group includes the computer system being configured to store the corresponding intermediate data in association with data that is identifiable to the data group .
US20040225638A1
CLAIM 12
. A computer system (computer system) comprising means adapted for carrying out the steps of the method according to claim 1 .

US8190610B2
CLAIM 21
. The computer system (computer system) of claim 17 , wherein : the at least one processor and memory being configured to process the intermediate data for each data group in a manner that is defined to correspond to that data group includes , for each data group , the at least one processor and memory being configured to employ an iterator that corresponds to that data group .
US20040225638A1
CLAIM 12
. A computer system (computer system) comprising means adapted for carrying out the steps of the method according to claim 1 .

US8190610B2
CLAIM 22
. The computer system (computer system) of claim 21 , wherein : the intermediate data includes a plurality of grouped sets of key/value pairs ;

the at least one processor and memory are further operable to reduce the intermediate data via a plurality of reducers ;

for at least one of the reducers , the iterator corresponding to a particular data group , for that reducer , is configured to operate according to a different key of a different schema than the iterator corresponding to another particular data group , for that reducer .
US20040225638A1
CLAIM 12
. A computer system (computer system) comprising means adapted for carrying out the steps of the method according to claim 1 .

US8190610B2
CLAIM 23
. The computer system (computer system) of claim 17 , wherein : the intermediate data includes a plurality of grouped sets of key/value pairs ;

the at least one processor and memory are further operable to reduce the intermediate data via a plurality of reducers ;

and the at least one processor and memory are further operable to : partition the intermediate data into a plurality of partitions , to provide each key/value pair of the intermediate data to a separate one of the partitions ;

and provide the intermediate data of each partition to a separate one of the reducers .
US20040225638A1
CLAIM 12
. A computer system (computer system) comprising means adapted for carrying out the steps of the method according to claim 1 .

US8190610B2
CLAIM 24
. The computer system (computer system) of claim 17 , wherein : the intermediate data includes a plurality of grouped sets of key/value pairs ;

the at least one processor and memory are further operable to reduce the intermediate data via a plurality of reducers ;

and the at least one processor and memory are further operable to : partition the intermediate data into a plurality of partitions , to provide at least some of the key/value pairs of the intermediate data to more than one of the partitions ;

and provide the intermediate data of each partition to a separate one of the reducers .
US20040225638A1
CLAIM 12
. A computer system (computer system) comprising means adapted for carrying out the steps of the method according to claim 1 .

US8190610B2
CLAIM 25
. The computer system (computer system) of claim 24 , wherein : the at least one processor and memory are further operable to , in partitioning , to provide all of the key/value pairs of the intermediate data to all of the partitions .
US20040225638A1
CLAIM 12
. A computer system (computer system) comprising means adapted for carrying out the steps of the method according to claim 1 .

US8190610B2
CLAIM 26
. The computer system (computer system) of claim 24 , wherein : at least some of the reducers include a sort , group-by-key and combine task ;

the at least one processor and memory are further operable to generate and provide metadata for at least some of the mapping , partitioning , combining , grouping and sorting .
US20040225638A1
CLAIM 12
. A computer system (computer system) comprising means adapted for carrying out the steps of the method according to claim 1 .

US8190610B2
CLAIM 27
. The computer system (computer system) of claim 26 , wherein : the reducing includes processing the metadata .
US20040225638A1
CLAIM 12
. A computer system (computer system) comprising means adapted for carrying out the steps of the method according to claim 1 .

US8190610B2
CLAIM 28
. The computer system (computer system) of claim 27 , wherein : the at least one processor and memory being further operable to process the intermediate data for each data group in a manner that is defined to correspond to that data group includes , for each data group , the at least one processor and memory being further operable to employ an iterator that corresponds to that data group , wherein the iterator is configured to provide the associated metadata to the reducing .
US20040225638A1
CLAIM 12
. A computer system (computer system) comprising means adapted for carrying out the steps of the method according to claim 1 .

US8190610B2
CLAIM 29
. The computer system (computer system) of claim 17 , wherein : the intermediate data processing of the reducing further comprises processing data that is not intermediate data .
US20040225638A1
CLAIM 12
. A computer system (computer system) comprising means adapted for carrying out the steps of the method according to claim 1 .

US8190610B2
CLAIM 30
. The computer system (computer system) of claim 29 , wherein : the at least one processor and memory are further operable to reduce the intermediate data via a plurality of reducers ;

and the data that is not intermediate data , for a particular reducer , includes data that is associated with another reducer .
US20040225638A1
CLAIM 12
. A computer system (computer system) comprising means adapted for carrying out the steps of the method according to claim 1 .

US8190610B2
CLAIM 31
. The computer system (computer system) of claim 29 , wherein : the at least one processor and memory are further operable to reduce the intermediate data via a plurality of reducers ;

and the data that is not intermediate data , for a particular reducer , includes data that is associated with that reducer .
US20040225638A1
CLAIM 12
. A computer system (computer system) comprising means adapted for carrying out the steps of the method according to claim 1 .

US8190610B2
CLAIM 32
. The computer system (computer system) of claim 17 , wherein the at least one processor and memory are further operable to reduce by relating the data among the plurality of data groups .
US20040225638A1
CLAIM 12
. A computer system (computer system) comprising means adapted for carrying out the steps of the method according to claim 1 .

US8190610B2
CLAIM 33
. A map-reduce method of processing data from a plurality of groups having different schema over a computer system (computer system) , the method comprising : for a first data set having a plurality of first key-value pairs , wherein such first data set belongs to a first data group and the first key-value pairs have a first schema , partitioning the first data set into a plurality of data partitions and providing each data partition to a selected one of a plurality of mapping functions that are each user-configurable to independently output a plurality of lists of values for each of a set of keys found in such map function' ;

s corresponding data partition to form a first intermediate data set having a first set of resulting key-value pairs ;

for a second data set having a plurality of second key-value pairs , wherein such second data set belongs to a second data group and the second key-value pairs have a second schema , partitioning the first data set into a plurality of data partitions and providing each data partition to a selected one of a plurality of mapping functions to form a second intermediate data set having a second set of resulting key-value pairs differing from the first set of resulting key-value pairs , wherein the first schema differs from the second schema , wherein the first and second schema and the first and second set of resulting key-value pairs have a key in common ;

and reducing the first and the second intermediate data set together so as to form an output data set , wherein the reducing is accomplished by iterating on at least one key from the first and second intermediate data set , and the output data set has a different schema than the first and second schema , wherein the mapping and reducing operations are performed by a distributed system .
US20040225638A1
CLAIM 12
. A computer system (computer system) comprising means adapted for carrying out the steps of the method according to claim 1 .

US8190610B2
CLAIM 40
. A computer system (computer system) including a plurality of computing devices , the computer system configured to process data of a data set , wherein the data set comprises a plurality of data groups , the computer system comprises at least one processor and memory that are operable to perform the following operations : for a first data set having a plurality of first key-value pairs , wherein such first data set belongs to a first data group and the first key-value pairs have a first schema , partitioning the first data set into a plurality of data partitions and providing each data partition to a selected one of a plurality of mapping functions that are each user-configurable to independently output a plurality of lists of values for each of a set of keys found in such map function' ;

s corresponding data partition to form a first intermediate data set having a first set of resulting key-value pairs ;

for a second data set having a plurality of second key-value pairs , wherein such second data set belongs to a second data group and the second key-value pairs have a second schema , partitioning the first data set into a plurality of data partitions and providing each data partition to a selected one of a plurality of mapping functions to form a second intermediate data set having a second set of resulting key-value pairs differing from the first set of resulting key-value pairs , wherein the first schema differs from the second schema ;

and reducing the first and the second intermediate data set together so as to form an output data set , wherein the reducing is accomplished by iterating on at least one key from the first and second intermediate data set , and the output data set has a different schema than the first and second schema , wherein the first and second schema and the first and second set of resulting key-value pairs have a key in common .
US20040225638A1
CLAIM 12
. A computer system (computer system) comprising means adapted for carrying out the steps of the method according to claim 1 .

US8190610B2
CLAIM 41
. The computer system (computer system) of claim 40 , wherein the reducing is accomplished by iterating on at least one key from each of the first and second intermediate data set so that the output data set is a merging of a portion of the first and second intermediate data set .
US20040225638A1
CLAIM 12
. A computer system (computer system) comprising means adapted for carrying out the steps of the method according to claim 1 .

US8190610B2
CLAIM 42
. The computer system (computer system) of claim 40 , wherein the reducing is accomplished by a plurality of reducers , the d the at least one processor and memory further operable for : partitioning the first or second intermediate data set into a plurality of partitions , at least some of the key-value pairs of the first or second intermediate data being provided to more than one of the partitions ;

and providing the partitioned intermediate data set of each partition to a separate one of the reducers .
US20040225638A1
CLAIM 12
. A computer system (computer system) comprising means adapted for carrying out the steps of the method according to claim 1 .

US8190610B2
CLAIM 43
. The computer system (computer system) of claim 42 , wherein all of the key-value pairs of the first and second intermediate data set are provided to all of the reducers .
US20040225638A1
CLAIM 12
. A computer system (computer system) comprising means adapted for carrying out the steps of the method according to claim 1 .

US8190610B2
CLAIM 44
. The computer system (computer system) of claim 42 , wherein at least some of the reducers include a sort , group-by-key and combine task , the d the at least one processor and memory are further operable for : generating and providing metadata for at least some of the mapping , partitioning , combining , grouping and sorting .
US20040225638A1
CLAIM 12
. A computer system (computer system) comprising means adapted for carrying out the steps of the method according to claim 1 .

US8190610B2
CLAIM 45
. The computer system (computer system) of claim 44 , wherein the reducing includes processing the metadata .
US20040225638A1
CLAIM 12
. A computer system (computer system) comprising means adapted for carrying out the steps of the method according to claim 1 .

US8190610B2
CLAIM 46
. The computer system (computer system) of claim 45 , wherein iterating includes providing the associated metadata to the processing of the reducing step .
US20040225638A1
CLAIM 12
. A computer system (computer system) comprising means adapted for carrying out the steps of the method according to claim 1 .




US8190610B2

Filed: 2006-10-05     Issued: 2012-05-29

MapReduce for distributed database processing

(Original Assignee) Yahoo Inc     (Current Assignee) R2 Solutions LLC ; Altaba Inc

Ali Dasdan, Hung-Chih Yang, Ruey-Lung Hsiao
US20050177553A1

Filed: 2004-02-09     Issued: 2005-08-11

Optimized distinct count query system and method

(Original Assignee) Microsoft Corp     (Current Assignee) Microsoft Technology Licensing LLC

Alexander Berger, Alexander Balikov
US8190610B2
CLAIM 1
. A method of processing data of a data set over a distributed system , wherein the data set comprises a plurality of data groups , the method comprising : partitioning the data of each one of the data groups into a plurality of data partitions (data partitions) that each have a plurality of key-value pairs and providing each data partition to a selected one of a plurality of mapping functions that are each user-configurable to independently output a plurality of lists of values for each of a set of keys found in such map function' ;

s corresponding data partition to form corresponding intermediate data for that data group and identifiable to that data group , wherein the data of a first data group has a different schema than the data of a second data group and the data of the first data group is mapped differently than the data of the second data group so that different lists of values are output for the corresponding different intermediate data , wherein the different schema and corresponding different intermediate data have a key in common ;

and reducing the intermediate data for the data groups to at least one output data group , including processing the intermediate data for each data group in a manner that is defined to correspond to that data group , so as to result in a merging of the corresponding different intermediate data based on the key in common , wherein the mapping and reducing operations are performed by a distributed system .
US20050177553A1
CLAIM 22
. A method for executing a distinct count query on a database comprising : pre-aggregating database data ;
determining a minimum and maximum range of a plurality of data partitions (data partitions) ;
identifying independent partition groups to be executed simultaneously with other queried partitions , the independent partition groups including one or more partitions with a non-overlapping range with respect to other queried partitions .

US8190610B2
CLAIM 17
. A computer system including a plurality of computing devices , the computer system configured to process data of a data set , wherein the data set comprises a plurality of data groups , the computer system comprises at least one processor and memory that are operable to perform the following operations : partitioning the data of each one of the data groups into a plurality of data partitions (data partitions) that each have a plurality of key-value pairs and providing each data partition to a selected one of a plurality of mapping functions that are each user-configurable to independently output a plurality of lists of values for each of a set of keys found in such map function' ;

s corresponding data partition to form corresponding intermediate data for that data group and identifiable to that data group , wherein the data of a first data group has a different schema than the data of a second data group and the data of the first data group is mapped differently than the data of the second data group so that different lists of values are output for the corresponding different intermediate data , wherein the different schema and corresponding different intermediate data have a key in common ;

and reduce the intermediate data for the data groups to at least one output data group , including processing the intermediate data for each data group in a manner that is defined to correspond to that data group so as to result in a merging of the corresponding different intermediate data based on the key in common .
US20050177553A1
CLAIM 22
. A method for executing a distinct count query on a database comprising : pre-aggregating database data ;
determining a minimum and maximum range of a plurality of data partitions (data partitions) ;
identifying independent partition groups to be executed simultaneously with other queried partitions , the independent partition groups including one or more partitions with a non-overlapping range with respect to other queried partitions .

US8190610B2
CLAIM 33
. A map-reduce method of processing data from a plurality of groups having different schema over a computer system , the method comprising : for a first data set having a plurality of first key-value pairs , wherein such first data set belongs to a first data group and the first key-value pairs have a first schema , partitioning the first data set into a plurality of data partitions (data partitions) and providing each data partition to a selected one of a plurality of mapping functions that are each user-configurable to independently output a plurality of lists of values for each of a set of keys found in such map function' ;

s corresponding data partition to form a first intermediate data set having a first set of resulting key-value pairs ;

for a second data set having a plurality of second key-value pairs , wherein such second data set belongs to a second data group and the second key-value pairs have a second schema , partitioning the first data set into a plurality of data partitions and providing each data partition to a selected one of a plurality of mapping functions to form a second intermediate data set having a second set (including one) of resulting key-value pairs differing from the first set of resulting key-value pairs , wherein the first schema differs from the second schema , wherein the first and second schema and the first and second set of resulting key-value pairs have a key in common ;

and reducing the first and the second intermediate data set together so as to form an output data set , wherein the reducing is accomplished by iterating on at least one key from the first and second intermediate data set , and the output data set has a different schema than the first and second schema , wherein the mapping and reducing operations are performed by a distributed system .
US20050177553A1
CLAIM 22
. A method for executing a distinct count query on a database comprising : pre-aggregating database data ;
determining a minimum and maximum range of a plurality of data partitions (data partitions) ;
identifying independent partition groups to be executed simultaneously with other queried partitions , the independent partition groups including one (second set) or more partitions with a non-overlapping range with respect to other queried partitions .

US8190610B2
CLAIM 40
. A computer system including a plurality of computing devices , the computer system configured to process data of a data set , wherein the data set comprises a plurality of data groups , the computer system comprises at least one processor and memory that are operable to perform the following operations : for a first data set having a plurality of first key-value pairs , wherein such first data set belongs to a first data group and the first key-value pairs have a first schema , partitioning the first data set into a plurality of data partitions (data partitions) and providing each data partition to a selected one of a plurality of mapping functions that are each user-configurable to independently output a plurality of lists of values for each of a set of keys found in such map function' ;

s corresponding data partition to form a first intermediate data set having a first set of resulting key-value pairs ;

for a second data set having a plurality of second key-value pairs , wherein such second data set belongs to a second data group and the second key-value pairs have a second schema , partitioning the first data set into a plurality of data partitions and providing each data partition to a selected one of a plurality of mapping functions to form a second intermediate data set having a second set (including one) of resulting key-value pairs differing from the first set of resulting key-value pairs , wherein the first schema differs from the second schema ;

and reducing the first and the second intermediate data set together so as to form an output data set , wherein the reducing is accomplished by iterating on at least one key from the first and second intermediate data set , and the output data set has a different schema than the first and second schema , wherein the first and second schema and the first and second set of resulting key-value pairs have a key in common .
US20050177553A1
CLAIM 22
. A method for executing a distinct count query on a database comprising : pre-aggregating database data ;
determining a minimum and maximum range of a plurality of data partitions (data partitions) ;
identifying independent partition groups to be executed simultaneously with other queried partitions , the independent partition groups including one (second set) or more partitions with a non-overlapping range with respect to other queried partitions .




US8190610B2

Filed: 2006-10-05     Issued: 2012-05-29

MapReduce for distributed database processing

(Original Assignee) Yahoo Inc     (Current Assignee) R2 Solutions LLC ; Altaba Inc

Ali Dasdan, Hung-Chih Yang, Ruey-Lung Hsiao
US20050097286A1

Filed: 2003-10-30     Issued: 2005-05-05

Method of instantiating data placement heuristic

(Original Assignee) Hewlett Packard Development Co LP     (Current Assignee) Hewlett Packard Development Co LP

Magnus Karlsson, Christos Karamanolis
US8190610B2
CLAIM 12
. The method of claim 5 , wherein : the intermediate data includes a plurality of grouped sets of key/value pairs ;

the reducing step is carried out by a plurality of reducers ;

for at least one of the reducers , the iterator corresponding to a particular data group , for that reducer , operates according to a different key (upper limit) of a different schema than the iterator corresponding to another particular data group , for that reducer .
US20050097286A1
CLAIM 27
. The method of claim 19 wherein the placement constraint comprises a storage capacity constraint , which places an upper limit (different key) on a storage capacity for a node .

US8190610B2
CLAIM 22
. The computer system of claim 21 , wherein : the intermediate data includes a plurality of grouped sets of key/value pairs ;

the at least one processor and memory are further operable to reduce the intermediate data via a plurality of reducers ;

for at least one of the reducers , the iterator corresponding to a particular data group , for that reducer , is configured to operate according to a different key (upper limit) of a different schema than the iterator corresponding to another particular data group , for that reducer .
US20050097286A1
CLAIM 27
. The method of claim 19 wherein the placement constraint comprises a storage capacity constraint , which places an upper limit (different key) on a storage capacity for a node .




US8190610B2

Filed: 2006-10-05     Issued: 2012-05-29

MapReduce for distributed database processing

(Original Assignee) Yahoo Inc     (Current Assignee) R2 Solutions LLC ; Altaba Inc

Ali Dasdan, Hung-Chih Yang, Ruey-Lung Hsiao
US20040073545A1

Filed: 2003-10-07     Issued: 2004-04-15

Methods and apparatus for identifying related nodes in a directed graph having named arcs

(Original Assignee) Metatomix Inc     (Current Assignee) OBJECTSTORE Inc

Howard Greenblatt, Alan Greenblatt, David Bigwood, Colin Britton
US8190610B2
CLAIM 17
. A computer system including a plurality of computing devices (first portion) , the computer system configured to process data of a data set , wherein the data set comprises a plurality of data groups , the computer system comprises at least one processor and memory that are operable to perform the following operations : partitioning the data of each one of the data groups into a plurality of data partitions that each have a plurality of key-value pairs and providing each data partition to a selected one of a plurality of mapping functions that are each user-configurable to independently output a plurality of lists of values for each of a set of keys found in such map function' ;

s corresponding data partition to form corresponding intermediate data for that data group and identifiable to that data group , wherein the data of a first data group has a different schema than the data of a second data group and the data of the first data group is mapped differently than the data of the second data group so that different lists of values are output for the corresponding different intermediate data , wherein the different schema and corresponding different intermediate data have a key in common ;

and reduce the intermediate data for the data groups to at least one output data group , including processing the intermediate data for each data group in a manner that is defined to correspond to that data group so as to result in a merging of the corresponding different intermediate data based on the key in common .
US20040073545A1
CLAIM 9
. The method of claim 1 , comprising executing step (A) with respect to a first data set representing a first portion (computing devices) of the directed graph , and executing step (A) separately with respect to a second data set representing a second portion of the directed graph .

US8190610B2
CLAIM 33
. A map-reduce method of processing data from a plurality of groups having different schema over a computer system , the method comprising : for a first data set having a plurality of first key-value pairs , wherein such first data set belongs to a first data group and the first key-value pairs have a first schema , partitioning the first data set into a plurality of data partitions and providing each data partition to a selected one of a plurality of mapping functions that are each user-configurable to independently output a plurality of lists of values for each of a set of keys found in such map function' ;

s corresponding data partition to form a first intermediate data set having a first set of resulting key-value pairs ;

for a second data set (second data set) having a plurality of second key-value pairs , wherein such second data set belongs to a second data group and the second key-value pairs have a second schema , partitioning the first data set into a plurality of data partitions and providing each data partition to a selected one of a plurality of mapping functions to form a second intermediate data set having a second set of resulting key-value pairs differing from the first set of resulting key-value pairs , wherein the first schema differs from the second schema , wherein the first and second schema and the first and second set of resulting key-value pairs have a key in common ;

and reducing the first and the second intermediate data set together so as to form an output data set , wherein the reducing is accomplished by iterating on at least one key from the first and second intermediate data set , and the output data set has a different schema than the first and second schema , wherein the mapping and reducing operations are performed by a distributed system .
US20040073545A1
CLAIM 9
. The method of claim 1 , comprising executing step (A) with respect to a first data set representing a first portion of the directed graph , and executing step (A) separately with respect to a second data set (second data set) representing a second portion of the directed graph .

US8190610B2
CLAIM 40
. A computer system including a plurality of computing devices (first portion) , the computer system configured to process data of a data set , wherein the data set comprises a plurality of data groups , the computer system comprises at least one processor and memory that are operable to perform the following operations : for a first data set having a plurality of first key-value pairs , wherein such first data set belongs to a first data group and the first key-value pairs have a first schema , partitioning the first data set into a plurality of data partitions and providing each data partition to a selected one of a plurality of mapping functions that are each user-configurable to independently output a plurality of lists of values for each of a set of keys found in such map function' ;

s corresponding data partition to form a first intermediate data set having a first set of resulting key-value pairs ;

for a second data set (second data set) having a plurality of second key-value pairs , wherein such second data set belongs to a second data group and the second key-value pairs have a second schema , partitioning the first data set into a plurality of data partitions and providing each data partition to a selected one of a plurality of mapping functions to form a second intermediate data set having a second set of resulting key-value pairs differing from the first set of resulting key-value pairs , wherein the first schema differs from the second schema ;

and reducing the first and the second intermediate data set together so as to form an output data set , wherein the reducing is accomplished by iterating on at least one key from the first and second intermediate data set , and the output data set has a different schema than the first and second schema , wherein the first and second schema and the first and second set of resulting key-value pairs have a key in common .
US20040073545A1
CLAIM 9
. The method of claim 1 , comprising executing step (A) with respect to a first data set representing a first portion (computing devices) of the directed graph , and executing step (A) separately with respect to a second data set (second data set) representing a second portion of the directed graph .




US8190610B2

Filed: 2006-10-05     Issued: 2012-05-29

MapReduce for distributed database processing

(Original Assignee) Yahoo Inc     (Current Assignee) R2 Solutions LLC ; Altaba Inc

Ali Dasdan, Hung-Chih Yang, Ruey-Lung Hsiao
US20040064454A1

Filed: 2003-09-29     Issued: 2004-04-01

Controlled-access database system and method

(Original Assignee) Raf Technology Inc     (Current Assignee) Matthews International Corp

David Ross, Jack Love, Stephen Billester, Brent Smith
US8190610B2
CLAIM 1
. A method of processing data of a data set (access rights) over a distributed system , wherein the data set comprises a plurality of data groups , the method comprising : partitioning the data of each one of the data groups into a plurality of data partitions that each have a plurality of key-value pairs and providing each data partition to a selected one of a plurality of mapping functions that are each user-configurable to independently output a plurality of lists of values for each of a set of keys found in such map function' ;

s corresponding data partition to form corresponding intermediate data for that data group and identifiable to that data group , wherein the data of a first data (first data) group has a different schema than the data of a second data (second data) group and the data of the first data group is mapped differently than the data of the second data group so that different lists of values are output for the corresponding different intermediate data , wherein the different schema and corresponding different intermediate data have a key in common ;

and reducing the intermediate data for the data groups to at least one output data group , including processing the intermediate data for each data group in a manner that is defined to correspond to that data group , so as to result in a merging of the corresponding different intermediate data based on the key in common , wherein the mapping and reducing operations are performed by a distributed system .
US20040064454A1
CLAIM 1
. A database system regulating access to one or more data records according to authorized access rights (second set, data set, second data set, second intermediate data set, intermediate data set) , the database system comprising : one or more data crystals storing one or more data records in an obfuscated format ;
one or more iterators , each iterator programmed to access , deobfuscate , and return at least one of the one or more data records in response to a data request ;
one or more queries , each query predefined to receive an indication of an authorized type of data requirement , to request at least one data record from the iterator , and to select from among the returned at least one data record a requested data record satisfying the data requirement ;
and a key crystal granting access rights for the database system .

US20040064454A1
CLAIM 12
. The database system of claim 1 wherein : the one or more data records include a first data (first data) record and a second data (second data) record ;
the first data record employs a first obfuscated format and the second data record employs a second obfuscated format ;
and the second obfuscated format is different than the first obfuscated format .

US8190610B2
CLAIM 17
. A computer system including a plurality of computing devices , the computer system configured to process data of a data set (access rights) , wherein the data set comprises a plurality of data groups , the computer system comprises at least one processor and memory that are operable to perform the following operations : partitioning the data of each one of the data groups into a plurality of data partitions that each have a plurality of key-value pairs and providing each data partition to a selected one of a plurality of mapping functions that are each user-configurable to independently output a plurality of lists of values for each of a set of keys found in such map function' ;

s corresponding data partition to form corresponding intermediate data for that data group and identifiable to that data group , wherein the data of a first data (first data) group has a different schema than the data of a second data (second data) group and the data of the first data group is mapped differently than the data of the second data group so that different lists of values are output for the corresponding different intermediate data , wherein the different schema and corresponding different intermediate data have a key in common ;

and reduce the intermediate data for the data groups to at least one output data group , including processing the intermediate data for each data group in a manner that is defined to correspond to that data group so as to result in a merging of the corresponding different intermediate data based on the key in common .
US20040064454A1
CLAIM 1
. A database system regulating access to one or more data records according to authorized access rights (second set, data set, second data set, second intermediate data set, intermediate data set) , the database system comprising : one or more data crystals storing one or more data records in an obfuscated format ;
one or more iterators , each iterator programmed to access , deobfuscate , and return at least one of the one or more data records in response to a data request ;
one or more queries , each query predefined to receive an indication of an authorized type of data requirement , to request at least one data record from the iterator , and to select from among the returned at least one data record a requested data record satisfying the data requirement ;
and a key crystal granting access rights for the database system .

US20040064454A1
CLAIM 12
. The database system of claim 1 wherein : the one or more data records include a first data (first data) record and a second data (second data) record ;
the first data record employs a first obfuscated format and the second data record employs a second obfuscated format ;
and the second obfuscated format is different than the first obfuscated format .

US8190610B2
CLAIM 33
. A map-reduce method of processing data from a plurality of groups having different schema over a computer system , the method comprising : for a first data set (access rights) having a plurality of first key-value pairs , wherein such first data set belongs to a first data (first data) group and the first key-value pairs have a first schema , partitioning the first data set into a plurality of data partitions and providing each data partition to a selected one of a plurality of mapping functions that are each user-configurable to independently output a plurality of lists of values for each of a set of keys found in such map function' ;

s corresponding data partition to form a first intermediate data set having a first set of resulting key-value pairs ;

for a second data set having a plurality of second key-value pairs , wherein such second data set belongs to a second data (second data) group and the second key-value pairs have a second schema , partitioning the first data set into a plurality of data partitions and providing each data partition to a selected one of a plurality of mapping functions to form a second intermediate data set having a second set (access rights) of resulting key-value pairs differing from the first set of resulting key-value pairs , wherein the first schema differs from the second schema , wherein the first and second schema and the first and second set of resulting key-value pairs have a key in common ;

and reducing the first and the second intermediate data set together so as to form an output data set , wherein the reducing is accomplished by iterating on at least one key from the first and second intermediate data set , and the output data set has a different schema than the first and second schema , wherein the mapping and reducing operations are performed by a distributed system .
US20040064454A1
CLAIM 1
. A database system regulating access to one or more data records according to authorized access rights (second set, data set, second data set, second intermediate data set, intermediate data set) , the database system comprising : one or more data crystals storing one or more data records in an obfuscated format ;
one or more iterators , each iterator programmed to access , deobfuscate , and return at least one of the one or more data records in response to a data request ;
one or more queries , each query predefined to receive an indication of an authorized type of data requirement , to request at least one data record from the iterator , and to select from among the returned at least one data record a requested data record satisfying the data requirement ;
and a key crystal granting access rights for the database system .

US20040064454A1
CLAIM 12
. The database system of claim 1 wherein : the one or more data records include a first data (first data) record and a second data (second data) record ;
the first data record employs a first obfuscated format and the second data record employs a second obfuscated format ;
and the second obfuscated format is different than the first obfuscated format .

US8190610B2
CLAIM 34
. The map-reduce method of claim 33 , wherein the reducing is accomplished by iterating on at least one key from each of the first and second intermediate data set (access rights) so that the output data set is a merging of a portion of the first and second intermediate data set .
US20040064454A1
CLAIM 1
. A database system regulating access to one or more data records according to authorized access rights (second set, data set, second data set, second intermediate data set, intermediate data set) , the database system comprising : one or more data crystals storing one or more data records in an obfuscated format ;
one or more iterators , each iterator programmed to access , deobfuscate , and return at least one of the one or more data records in response to a data request ;
one or more queries , each query predefined to receive an indication of an authorized type of data requirement , to request at least one data record from the iterator , and to select from among the returned at least one data record a requested data record satisfying the data requirement ;
and a key crystal granting access rights for the database system .

US8190610B2
CLAIM 35
. The map-reduce method of claim 33 , wherein the reducing is accomplished by a plurality of reducers , the method further comprises : partitioning the first or second intermediate data set (access rights) into a plurality of partitions , at least some of the key-value pairs of the first or second intermediate data being provided to more than one of the partitions ;

and providing the partitioned intermediate data set of each partition to a separate one of the reducers .
US20040064454A1
CLAIM 1
. A database system regulating access to one or more data records according to authorized access rights (second set, data set, second data set, second intermediate data set, intermediate data set) , the database system comprising : one or more data crystals storing one or more data records in an obfuscated format ;
one or more iterators , each iterator programmed to access , deobfuscate , and return at least one of the one or more data records in response to a data request ;
one or more queries , each query predefined to receive an indication of an authorized type of data requirement , to request at least one data record from the iterator , and to select from among the returned at least one data record a requested data record satisfying the data requirement ;
and a key crystal granting access rights for the database system .

US8190610B2
CLAIM 36
. The map-reduce method of claim 35 , wherein all of the key-value pairs of the first and second intermediate data set (access rights) are provided to all of the reducers .
US20040064454A1
CLAIM 1
. A database system regulating access to one or more data records according to authorized access rights (second set, data set, second data set, second intermediate data set, intermediate data set) , the database system comprising : one or more data crystals storing one or more data records in an obfuscated format ;
one or more iterators , each iterator programmed to access , deobfuscate , and return at least one of the one or more data records in response to a data request ;
one or more queries , each query predefined to receive an indication of an authorized type of data requirement , to request at least one data record from the iterator , and to select from among the returned at least one data record a requested data record satisfying the data requirement ;
and a key crystal granting access rights for the database system .

US8190610B2
CLAIM 40
. A computer system including a plurality of computing devices , the computer system configured to process data of a data set (access rights) , wherein the data set comprises a plurality of data groups , the computer system comprises at least one processor and memory that are operable to perform the following operations : for a first data set having a plurality of first key-value pairs , wherein such first data set belongs to a first data (first data) group and the first key-value pairs have a first schema , partitioning the first data set into a plurality of data partitions and providing each data partition to a selected one of a plurality of mapping functions that are each user-configurable to independently output a plurality of lists of values for each of a set of keys found in such map function' ;

s corresponding data partition to form a first intermediate data set having a first set of resulting key-value pairs ;

for a second data set having a plurality of second key-value pairs , wherein such second data set belongs to a second data (second data) group and the second key-value pairs have a second schema , partitioning the first data set into a plurality of data partitions and providing each data partition to a selected one of a plurality of mapping functions to form a second intermediate data set having a second set (access rights) of resulting key-value pairs differing from the first set of resulting key-value pairs , wherein the first schema differs from the second schema ;

and reducing the first and the second intermediate data set together so as to form an output data set , wherein the reducing is accomplished by iterating on at least one key from the first and second intermediate data set , and the output data set has a different schema than the first and second schema , wherein the first and second schema and the first and second set of resulting key-value pairs have a key in common .
US20040064454A1
CLAIM 1
. A database system regulating access to one or more data records according to authorized access rights (second set, data set, second data set, second intermediate data set, intermediate data set) , the database system comprising : one or more data crystals storing one or more data records in an obfuscated format ;
one or more iterators , each iterator programmed to access , deobfuscate , and return at least one of the one or more data records in response to a data request ;
one or more queries , each query predefined to receive an indication of an authorized type of data requirement , to request at least one data record from the iterator , and to select from among the returned at least one data record a requested data record satisfying the data requirement ;
and a key crystal granting access rights for the database system .

US20040064454A1
CLAIM 12
. The database system of claim 1 wherein : the one or more data records include a first data (first data) record and a second data (second data) record ;
the first data record employs a first obfuscated format and the second data record employs a second obfuscated format ;
and the second obfuscated format is different than the first obfuscated format .

US8190610B2
CLAIM 41
. The computer system of claim 40 , wherein the reducing is accomplished by iterating on at least one key from each of the first and second intermediate data set (access rights) so that the output data set is a merging of a portion of the first and second intermediate data set .
US20040064454A1
CLAIM 1
. A database system regulating access to one or more data records according to authorized access rights (second set, data set, second data set, second intermediate data set, intermediate data set) , the database system comprising : one or more data crystals storing one or more data records in an obfuscated format ;
one or more iterators , each iterator programmed to access , deobfuscate , and return at least one of the one or more data records in response to a data request ;
one or more queries , each query predefined to receive an indication of an authorized type of data requirement , to request at least one data record from the iterator , and to select from among the returned at least one data record a requested data record satisfying the data requirement ;
and a key crystal granting access rights for the database system .

US8190610B2
CLAIM 42
. The computer system of claim 40 , wherein the reducing is accomplished by a plurality of reducers , the d the at least one processor and memory further operable for : partitioning the first or second intermediate data set (access rights) into a plurality of partitions , at least some of the key-value pairs of the first or second intermediate data being provided to more than one of the partitions ;

and providing the partitioned intermediate data set of each partition to a separate one of the reducers .
US20040064454A1
CLAIM 1
. A database system regulating access to one or more data records according to authorized access rights (second set, data set, second data set, second intermediate data set, intermediate data set) , the database system comprising : one or more data crystals storing one or more data records in an obfuscated format ;
one or more iterators , each iterator programmed to access , deobfuscate , and return at least one of the one or more data records in response to a data request ;
one or more queries , each query predefined to receive an indication of an authorized type of data requirement , to request at least one data record from the iterator , and to select from among the returned at least one data record a requested data record satisfying the data requirement ;
and a key crystal granting access rights for the database system .

US8190610B2
CLAIM 43
. The computer system of claim 42 , wherein all of the key-value pairs of the first and second intermediate data set (access rights) are provided to all of the reducers .
US20040064454A1
CLAIM 1
. A database system regulating access to one or more data records according to authorized access rights (second set, data set, second data set, second intermediate data set, intermediate data set) , the database system comprising : one or more data crystals storing one or more data records in an obfuscated format ;
one or more iterators , each iterator programmed to access , deobfuscate , and return at least one of the one or more data records in response to a data request ;
one or more queries , each query predefined to receive an indication of an authorized type of data requirement , to request at least one data record from the iterator , and to select from among the returned at least one data record a requested data record satisfying the data requirement ;
and a key crystal granting access rights for the database system .




US8190610B2

Filed: 2006-10-05     Issued: 2012-05-29

MapReduce for distributed database processing

(Original Assignee) Yahoo Inc     (Current Assignee) R2 Solutions LLC ; Altaba Inc

Ali Dasdan, Hung-Chih Yang, Ruey-Lung Hsiao
US20040230586A1

Filed: 2003-07-30     Issued: 2004-11-18

Geometrization for pattern recognition, data analysis, data merging, and multiple criteria decision making

(Original Assignee) Abel Wolman     

Abel Wolman
US8190610B2
CLAIM 1
. A method of processing data of a data set over a distributed system , wherein the data set comprises a plurality of data groups (more elements) , the method comprising : partitioning the data of each one of the data groups into a plurality of data partitions that each have a plurality of key-value pairs and providing each data partition to a selected one of a plurality of mapping functions that are each user-configurable to independently output a plurality of lists of values for each of a set of keys found in such map function' ;

s corresponding data partition to form corresponding intermediate data for that data group and identifiable to that data group , wherein the data of a first data group has a different schema than the data of a second data group and the data of the first data group is mapped differently than the data of the second data group so that different lists of values are output for the corresponding different intermediate data , wherein the different schema and corresponding different intermediate data have a key in common ;

and reducing the intermediate data for the data groups to at least one output data group , including processing the intermediate data for each data group in a manner that is defined to correspond to that data group , so as to result in a merging of the corresponding different intermediate data based on the key in common , wherein the mapping and reducing operations are performed by a distributed system .
US20040230586A1
CLAIM 5
. The data scaling method of claim 3 wherein in step (b1) one or more of the data structures contain one or more elements (data groups) selected from the group consisting of missing values and augmenting values .

US8190610B2
CLAIM 2
. The method of claim 1 , wherein : the at least one output data group is a plurality of output data groups (more elements) .
US20040230586A1
CLAIM 5
. The data scaling method of claim 3 wherein in step (b1) one or more of the data structures contain one or more elements (data groups) selected from the group consisting of missing values and augmenting values .

US8190610B2
CLAIM 8
. The method of claim 7 , wherein : in the partitioning step (second partition) , all of the key/value pairs of the intermediate data are provided to all of the partitions .
US20040230586A1
CLAIM 19
. A data scaling method comprising the steps of : (a) receiving data ;
(b) partitioning the received data ;
(c) forming one or more symmetric matrices from the partitioned received data ;
(d) forming a second partition (partitioning step) of the received data ;
(e) associating a scale type to each subset of the second partition of the received data ;
(f) applying admissible geometrization to the doubly partitioned received data to produce admissibly transformed data ;
and (g) interpreting the admissibly transformed data as scaled data .

US8190610B2
CLAIM 16
. The method of claim 1 wherein the reducing step includes relating the data among the plurality of data groups (more elements) .
US20040230586A1
CLAIM 5
. The data scaling method of claim 3 wherein in step (b1) one or more of the data structures contain one or more elements (data groups) selected from the group consisting of missing values and augmenting values .

US8190610B2
CLAIM 17
. A computer system including a plurality of computing devices , the computer system configured to process data of a data set , wherein the data set comprises a plurality of data groups (more elements) , the computer system comprises at least one processor and memory that are operable to perform the following operations : partitioning the data of each one of the data groups into a plurality of data partitions that each have a plurality of key-value pairs and providing each data partition to a selected one of a plurality of mapping functions that are each user-configurable to independently output a plurality of lists of values for each of a set of keys found in such map function' ;

s corresponding data partition to form corresponding intermediate data for that data group and identifiable to that data group , wherein the data of a first data group has a different schema than the data of a second data group and the data of the first data group is mapped differently than the data of the second data group so that different lists of values are output for the corresponding different intermediate data , wherein the different schema and corresponding different intermediate data have a key in common ;

and reduce the intermediate data for the data groups to at least one output data group , including processing the intermediate data for each data group in a manner that is defined to correspond to that data group so as to result in a merging of the corresponding different intermediate data based on the key in common .
US20040230586A1
CLAIM 5
. The data scaling method of claim 3 wherein in step (b1) one or more of the data structures contain one or more elements (data groups) selected from the group consisting of missing values and augmenting values .

US8190610B2
CLAIM 18
. The computer system of claim 17 , wherein : the at least one output data group is a plurality of output data groups (more elements) .
US20040230586A1
CLAIM 5
. The data scaling method of claim 3 wherein in step (b1) one or more of the data structures contain one or more elements (data groups) selected from the group consisting of missing values and augmenting values .

US8190610B2
CLAIM 32
. The computer system of claim 17 , wherein the at least one processor and memory are further operable to reduce by relating the data among the plurality of data groups (more elements) .
US20040230586A1
CLAIM 5
. The data scaling method of claim 3 wherein in step (b1) one or more of the data structures contain one or more elements (data groups) selected from the group consisting of missing values and augmenting values .

US8190610B2
CLAIM 40
. A computer system including a plurality of computing devices , the computer system configured to process data of a data set , wherein the data set comprises a plurality of data groups (more elements) , the computer system comprises at least one processor and memory that are operable to perform the following operations : for a first data set having a plurality of first key-value pairs , wherein such first data set belongs to a first data group and the first key-value pairs have a first schema , partitioning the first data set into a plurality of data partitions and providing each data partition to a selected one of a plurality of mapping functions that are each user-configurable to independently output a plurality of lists of values for each of a set of keys found in such map function' ;

s corresponding data partition to form a first intermediate data set having a first set of resulting key-value pairs ;

for a second data set having a plurality of second key-value pairs , wherein such second data set belongs to a second data group and the second key-value pairs have a second schema , partitioning the first data set into a plurality of data partitions and providing each data partition to a selected one of a plurality of mapping functions to form a second intermediate data set having a second set of resulting key-value pairs differing from the first set of resulting key-value pairs , wherein the first schema differs from the second schema ;

and reducing the first and the second intermediate data set together so as to form an output data set , wherein the reducing is accomplished by iterating on at least one key from the first and second intermediate data set , and the output data set has a different schema than the first and second schema , wherein the first and second schema and the first and second set of resulting key-value pairs have a key in common .
US20040230586A1
CLAIM 5
. The data scaling method of claim 3 wherein in step (b1) one or more of the data structures contain one or more elements (data groups) selected from the group consisting of missing values and augmenting values .




US8190610B2

Filed: 2006-10-05     Issued: 2012-05-29

MapReduce for distributed database processing

(Original Assignee) Yahoo Inc     (Current Assignee) R2 Solutions LLC ; Altaba Inc

Ali Dasdan, Hung-Chih Yang, Ruey-Lung Hsiao
US20050198043A1

Filed: 2003-05-15     Issued: 2005-09-08

Database masking and privilege for organizations

(Original Assignee) Kintera Inc     (Current Assignee) Kintera Inc

Harry Gruber, Jeane Chen, Allen Gruber
US8190610B2
CLAIM 1
. A method of processing data of a data set over a distributed system , wherein the data set comprises a plurality of data groups , the method comprising : partitioning the data of each one of the data groups into a plurality of data partitions that each have a plurality of key-value pairs and providing each data partition to a selected one of a plurality of mapping functions (level organization) that are each user-configurable to independently output a plurality of lists of values for each of a set of keys found in such map function' ;

s corresponding data partition to form corresponding intermediate data for that data group and identifiable to that data group , wherein the data of a first data group has a different schema (more fields) than the data of a second data group and the data of the first data group is mapped differently than the data of the second data group so that different lists of values are output for the corresponding different intermediate data , wherein the different schema and corresponding different intermediate data have a key in common ;

and reducing the intermediate data for the data groups to at least one output data group , including processing the intermediate data for each data group in a manner that is defined to correspond to that data group , so as to result in a merging of the corresponding different intermediate data based on the key in common , wherein the mapping and reducing operations are performed by a distributed system .
US20050198043A1
CLAIM 1
. A database in a computer system linked to a network and configured to store one or more organizations' ;
data , each organization having one or more sub-organizations , the computer system having one or more processors and one or more storage devices coupled to the processor for storing data , comprising : one or more virtual data islands partitioned inside the database , each virtual data island storing data of an organization ;
each virtual data islands further partitioned into one or more sub-islands , wherein each sub-island storing data for a sub-organization ;
one or more constituent records (CR) in each sub-island , each including one or more fields (different schema, first schema) with data ;
wherein a sub-organization can share data from selected fields with organizations and other sub-organizations .

US20050198043A1
CLAIM 16
. The database according to claim 15 , wherein standard fields are shared with other sub-organizations and multi-level organization (mapping functions) s .

US8190610B2
CLAIM 12
. The method of claim 5 , wherein : the intermediate data includes a plurality of grouped sets of key/value pairs ;

the reducing step is carried out by a plurality of reducers ;

for at least one of the reducers , the iterator corresponding to a particular data group (more process) , for that reducer , operates according to a different key of a different schema (more fields) than the iterator corresponding to another particular data group , for that reducer .
US20050198043A1
CLAIM 1
. A database in a computer system linked to a network and configured to store one or more organizations' ;
data , each organization having one or more sub-organizations , the computer system having one or more process (particular data group) ors and one or more storage devices coupled to the processor for storing data , comprising : one or more virtual data islands partitioned inside the database , each virtual data island storing data of an organization ;
each virtual data islands further partitioned into one or more sub-islands , wherein each sub-island storing data for a sub-organization ;
one or more constituent records (CR) in each sub-island , each including one or more fields (different schema, first schema) with data ;
wherein a sub-organization can share data from selected fields with organizations and other sub-organizations .

US8190610B2
CLAIM 17
. A computer system (computer system) including a plurality of computing devices , the computer system configured to process data of a data set , wherein the data set comprises a plurality of data groups , the computer system comprises at least one processor and memory that are operable to perform the following operations : partitioning the data of each one of the data groups into a plurality of data partitions that each have a plurality of key-value pairs and providing each data partition to a selected one of a plurality of mapping functions (level organization) that are each user-configurable to independently output a plurality of lists of values for each of a set of keys found in such map function' ;

s corresponding data partition to form corresponding intermediate data for that data group and identifiable to that data group , wherein the data of a first data group has a different schema (more fields) than the data of a second data group and the data of the first data group is mapped differently than the data of the second data group so that different lists of values are output for the corresponding different intermediate data , wherein the different schema and corresponding different intermediate data have a key in common ;

and reduce the intermediate data for the data groups to at least one output data group , including processing the intermediate data for each data group in a manner that is defined to correspond to that data group so as to result in a merging of the corresponding different intermediate data based on the key in common .
US20050198043A1
CLAIM 1
. A database in a computer system (computer system) linked to a network and configured to store one or more organizations' ;
data , each organization having one or more sub-organizations , the computer system having one or more processors and one or more storage devices coupled to the processor for storing data , comprising : one or more virtual data islands partitioned inside the database , each virtual data island storing data of an organization ;
each virtual data islands further partitioned into one or more sub-islands , wherein each sub-island storing data for a sub-organization ;
one or more constituent records (CR) in each sub-island , each including one or more fields (different schema, first schema) with data ;
wherein a sub-organization can share data from selected fields with organizations and other sub-organizations .

US20050198043A1
CLAIM 16
. The database according to claim 15 , wherein standard fields are shared with other sub-organizations and multi-level organization (mapping functions) s .

US8190610B2
CLAIM 18
. The computer system (computer system) of claim 17 , wherein : the at least one output data group is a plurality of output data groups .
US20050198043A1
CLAIM 1
. A database in a computer system (computer system) linked to a network and configured to store one or more organizations' ;
data , each organization having one or more sub-organizations , the computer system having one or more processors and one or more storage devices coupled to the processor for storing data , comprising : one or more virtual data islands partitioned inside the database , each virtual data island storing data of an organization ;
each virtual data islands further partitioned into one or more sub-islands , wherein each sub-island storing data for a sub-organization ;
one or more constituent records (CR) in each sub-island , each including one or more fields with data ;
wherein a sub-organization can share data from selected fields with organizations and other sub-organizations .

US8190610B2
CLAIM 19
. The computer system (computer system) of claim 17 , wherein : corresponding intermediate data for a data group being identifiable to the data group includes the corresponding intermediate data being stored such that metadata associated with the manner of storage is identifiable to the data group .
US20050198043A1
CLAIM 1
. A database in a computer system (computer system) linked to a network and configured to store one or more organizations' ;
data , each organization having one or more sub-organizations , the computer system having one or more processors and one or more storage devices coupled to the processor for storing data , comprising : one or more virtual data islands partitioned inside the database , each virtual data island storing data of an organization ;
each virtual data islands further partitioned into one or more sub-islands , wherein each sub-island storing data for a sub-organization ;
one or more constituent records (CR) in each sub-island , each including one or more fields with data ;
wherein a sub-organization can share data from selected fields with organizations and other sub-organizations .

US8190610B2
CLAIM 20
. The computer system (computer system) of claim 19 , wherein : corresponding intermediate data for a data group being identifiable to the data group includes the computer system being configured to store the corresponding intermediate data in association with data that is identifiable to the data group .
US20050198043A1
CLAIM 1
. A database in a computer system (computer system) linked to a network and configured to store one or more organizations' ;
data , each organization having one or more sub-organizations , the computer system having one or more processors and one or more storage devices coupled to the processor for storing data , comprising : one or more virtual data islands partitioned inside the database , each virtual data island storing data of an organization ;
each virtual data islands further partitioned into one or more sub-islands , wherein each sub-island storing data for a sub-organization ;
one or more constituent records (CR) in each sub-island , each including one or more fields with data ;
wherein a sub-organization can share data from selected fields with organizations and other sub-organizations .

US8190610B2
CLAIM 21
. The computer system (computer system) of claim 17 , wherein : the at least one processor and memory being configured to process the intermediate data for each data group in a manner that is defined to correspond to that data group includes , for each data group , the at least one processor and memory being configured to employ an iterator that corresponds to that data group .
US20050198043A1
CLAIM 1
. A database in a computer system (computer system) linked to a network and configured to store one or more organizations' ;
data , each organization having one or more sub-organizations , the computer system having one or more processors and one or more storage devices coupled to the processor for storing data , comprising : one or more virtual data islands partitioned inside the database , each virtual data island storing data of an organization ;
each virtual data islands further partitioned into one or more sub-islands , wherein each sub-island storing data for a sub-organization ;
one or more constituent records (CR) in each sub-island , each including one or more fields with data ;
wherein a sub-organization can share data from selected fields with organizations and other sub-organizations .

US8190610B2
CLAIM 22
. The computer system (computer system) of claim 21 , wherein : the intermediate data includes a plurality of grouped sets of key/value pairs ;

the at least one processor and memory are further operable to reduce the intermediate data via a plurality of reducers ;

for at least one of the reducers , the iterator corresponding to a particular data group (more process) , for that reducer , is configured to operate according to a different key of a different schema (more fields) than the iterator corresponding to another particular data group , for that reducer .
US20050198043A1
CLAIM 1
. A database in a computer system (computer system) linked to a network and configured to store one or more organizations' ;
data , each organization having one or more sub-organizations , the computer system having one or more process (particular data group) ors and one or more storage devices coupled to the processor for storing data , comprising : one or more virtual data islands partitioned inside the database , each virtual data island storing data of an organization ;
each virtual data islands further partitioned into one or more sub-islands , wherein each sub-island storing data for a sub-organization ;
one or more constituent records (CR) in each sub-island , each including one or more fields (different schema, first schema) with data ;
wherein a sub-organization can share data from selected fields with organizations and other sub-organizations .

US8190610B2
CLAIM 23
. The computer system (computer system) of claim 17 , wherein : the intermediate data includes a plurality of grouped sets of key/value pairs ;

the at least one processor and memory are further operable to reduce the intermediate data via a plurality of reducers ;

and the at least one processor and memory are further operable to : partition the intermediate data into a plurality of partitions , to provide each key/value pair of the intermediate data to a separate one of the partitions ;

and provide the intermediate data of each partition to a separate one of the reducers .
US20050198043A1
CLAIM 1
. A database in a computer system (computer system) linked to a network and configured to store one or more organizations' ;
data , each organization having one or more sub-organizations , the computer system having one or more processors and one or more storage devices coupled to the processor for storing data , comprising : one or more virtual data islands partitioned inside the database , each virtual data island storing data of an organization ;
each virtual data islands further partitioned into one or more sub-islands , wherein each sub-island storing data for a sub-organization ;
one or more constituent records (CR) in each sub-island , each including one or more fields with data ;
wherein a sub-organization can share data from selected fields with organizations and other sub-organizations .

US8190610B2
CLAIM 24
. The computer system (computer system) of claim 17 , wherein : the intermediate data includes a plurality of grouped sets of key/value pairs ;

the at least one processor and memory are further operable to reduce the intermediate data via a plurality of reducers ;

and the at least one processor and memory are further operable to : partition the intermediate data into a plurality of partitions , to provide at least some of the key/value pairs of the intermediate data to more than one of the partitions ;

and provide the intermediate data of each partition to a separate one of the reducers .
US20050198043A1
CLAIM 1
. A database in a computer system (computer system) linked to a network and configured to store one or more organizations' ;
data , each organization having one or more sub-organizations , the computer system having one or more processors and one or more storage devices coupled to the processor for storing data , comprising : one or more virtual data islands partitioned inside the database , each virtual data island storing data of an organization ;
each virtual data islands further partitioned into one or more sub-islands , wherein each sub-island storing data for a sub-organization ;
one or more constituent records (CR) in each sub-island , each including one or more fields with data ;
wherein a sub-organization can share data from selected fields with organizations and other sub-organizations .

US8190610B2
CLAIM 25
. The computer system (computer system) of claim 24 , wherein : the at least one processor and memory are further operable to , in partitioning , to provide all of the key/value pairs of the intermediate data to all of the partitions .
US20050198043A1
CLAIM 1
. A database in a computer system (computer system) linked to a network and configured to store one or more organizations' ;
data , each organization having one or more sub-organizations , the computer system having one or more processors and one or more storage devices coupled to the processor for storing data , comprising : one or more virtual data islands partitioned inside the database , each virtual data island storing data of an organization ;
each virtual data islands further partitioned into one or more sub-islands , wherein each sub-island storing data for a sub-organization ;
one or more constituent records (CR) in each sub-island , each including one or more fields with data ;
wherein a sub-organization can share data from selected fields with organizations and other sub-organizations .

US8190610B2
CLAIM 26
. The computer system (computer system) of claim 24 , wherein : at least some of the reducers include a sort , group-by-key and combine task ;

the at least one processor and memory are further operable to generate and provide metadata for at least some of the mapping , partitioning , combining , grouping and sorting .
US20050198043A1
CLAIM 1
. A database in a computer system (computer system) linked to a network and configured to store one or more organizations' ;
data , each organization having one or more sub-organizations , the computer system having one or more processors and one or more storage devices coupled to the processor for storing data , comprising : one or more virtual data islands partitioned inside the database , each virtual data island storing data of an organization ;
each virtual data islands further partitioned into one or more sub-islands , wherein each sub-island storing data for a sub-organization ;
one or more constituent records (CR) in each sub-island , each including one or more fields with data ;
wherein a sub-organization can share data from selected fields with organizations and other sub-organizations .

US8190610B2
CLAIM 27
. The computer system (computer system) of claim 26 , wherein : the reducing includes processing the metadata .
US20050198043A1
CLAIM 1
. A database in a computer system (computer system) linked to a network and configured to store one or more organizations' ;
data , each organization having one or more sub-organizations , the computer system having one or more processors and one or more storage devices coupled to the processor for storing data , comprising : one or more virtual data islands partitioned inside the database , each virtual data island storing data of an organization ;
each virtual data islands further partitioned into one or more sub-islands , wherein each sub-island storing data for a sub-organization ;
one or more constituent records (CR) in each sub-island , each including one or more fields with data ;
wherein a sub-organization can share data from selected fields with organizations and other sub-organizations .

US8190610B2
CLAIM 28
. The computer system (computer system) of claim 27 , wherein : the at least one processor and memory being further operable to process the intermediate data for each data group in a manner that is defined to correspond to that data group includes , for each data group , the at least one processor and memory being further operable to employ an iterator that corresponds to that data group , wherein the iterator is configured to provide the associated metadata to the reducing .
US20050198043A1
CLAIM 1
. A database in a computer system (computer system) linked to a network and configured to store one or more organizations' ;
data , each organization having one or more sub-organizations , the computer system having one or more processors and one or more storage devices coupled to the processor for storing data , comprising : one or more virtual data islands partitioned inside the database , each virtual data island storing data of an organization ;
each virtual data islands further partitioned into one or more sub-islands , wherein each sub-island storing data for a sub-organization ;
one or more constituent records (CR) in each sub-island , each including one or more fields with data ;
wherein a sub-organization can share data from selected fields with organizations and other sub-organizations .

US8190610B2
CLAIM 29
. The computer system (computer system) of claim 17 , wherein : the intermediate data processing of the reducing further comprises processing data that is not intermediate data .
US20050198043A1
CLAIM 1
. A database in a computer system (computer system) linked to a network and configured to store one or more organizations' ;
data , each organization having one or more sub-organizations , the computer system having one or more processors and one or more storage devices coupled to the processor for storing data , comprising : one or more virtual data islands partitioned inside the database , each virtual data island storing data of an organization ;
each virtual data islands further partitioned into one or more sub-islands , wherein each sub-island storing data for a sub-organization ;
one or more constituent records (CR) in each sub-island , each including one or more fields with data ;
wherein a sub-organization can share data from selected fields with organizations and other sub-organizations .

US8190610B2
CLAIM 30
. The computer system (computer system) of claim 29 , wherein : the at least one processor and memory are further operable to reduce the intermediate data via a plurality of reducers ;

and the data that is not intermediate data , for a particular reducer , includes data that is associated with another reducer .
US20050198043A1
CLAIM 1
. A database in a computer system (computer system) linked to a network and configured to store one or more organizations' ;
data , each organization having one or more sub-organizations , the computer system having one or more processors and one or more storage devices coupled to the processor for storing data , comprising : one or more virtual data islands partitioned inside the database , each virtual data island storing data of an organization ;
each virtual data islands further partitioned into one or more sub-islands , wherein each sub-island storing data for a sub-organization ;
one or more constituent records (CR) in each sub-island , each including one or more fields with data ;
wherein a sub-organization can share data from selected fields with organizations and other sub-organizations .

US8190610B2
CLAIM 31
. The computer system (computer system) of claim 29 , wherein : the at least one processor and memory are further operable to reduce the intermediate data via a plurality of reducers ;

and the data that is not intermediate data , for a particular reducer , includes data that is associated with that reducer .
US20050198043A1
CLAIM 1
. A database in a computer system (computer system) linked to a network and configured to store one or more organizations' ;
data , each organization having one or more sub-organizations , the computer system having one or more processors and one or more storage devices coupled to the processor for storing data , comprising : one or more virtual data islands partitioned inside the database , each virtual data island storing data of an organization ;
each virtual data islands further partitioned into one or more sub-islands , wherein each sub-island storing data for a sub-organization ;
one or more constituent records (CR) in each sub-island , each including one or more fields with data ;
wherein a sub-organization can share data from selected fields with organizations and other sub-organizations .

US8190610B2
CLAIM 32
. The computer system (computer system) of claim 17 , wherein the at least one processor and memory are further operable to reduce by relating the data among the plurality of data groups .
US20050198043A1
CLAIM 1
. A database in a computer system (computer system) linked to a network and configured to store one or more organizations' ;
data , each organization having one or more sub-organizations , the computer system having one or more processors and one or more storage devices coupled to the processor for storing data , comprising : one or more virtual data islands partitioned inside the database , each virtual data island storing data of an organization ;
each virtual data islands further partitioned into one or more sub-islands , wherein each sub-island storing data for a sub-organization ;
one or more constituent records (CR) in each sub-island , each including one or more fields with data ;
wherein a sub-organization can share data from selected fields with organizations and other sub-organizations .

US8190610B2
CLAIM 33
. A map-reduce method of processing data from a plurality of groups having different schema (more fields) over a computer system (computer system) , the method comprising : for a first data set having a plurality of first key-value pairs , wherein such first data set belongs to a first data group and the first key-value pairs have a first schema (more fields) , partitioning the first data set into a plurality of data partitions and providing each data partition to a selected one of a plurality of mapping functions (level organization) that are each user-configurable to independently output a plurality of lists of values for each of a set of keys found in such map function' ;

s corresponding data partition to form a first intermediate data set having a first set of resulting key-value pairs ;

for a second data set having a plurality of second key-value pairs , wherein such second data set belongs to a second data group and the second key-value pairs have a second schema , partitioning the first data set into a plurality of data partitions and providing each data partition to a selected one of a plurality of mapping functions to form a second intermediate data set having a second set (including one) of resulting key-value pairs differing from the first set of resulting key-value pairs , wherein the first schema differs from the second schema , wherein the first and second schema and the first and second set of resulting key-value pairs have a key in common ;

and reducing the first and the second intermediate data set together so as to form an output data set , wherein the reducing is accomplished by iterating on at least one key from the first and second intermediate data set , and the output data set has a different schema than the first and second schema , wherein the mapping and reducing operations are performed by a distributed system .
US20050198043A1
CLAIM 1
. A database in a computer system (computer system) linked to a network and configured to store one or more organizations' ;
data , each organization having one or more sub-organizations , the computer system having one or more processors and one or more storage devices coupled to the processor for storing data , comprising : one or more virtual data islands partitioned inside the database , each virtual data island storing data of an organization ;
each virtual data islands further partitioned into one or more sub-islands , wherein each sub-island storing data for a sub-organization ;
one or more constituent records (CR) in each sub-island , each including one (second set) or more fields (different schema, first schema) with data ;
wherein a sub-organization can share data from selected fields with organizations and other sub-organizations .

US20050198043A1
CLAIM 16
. The database according to claim 15 , wherein standard fields are shared with other sub-organizations and multi-level organization (mapping functions) s .

US8190610B2
CLAIM 40
. A computer system (computer system) including a plurality of computing devices , the computer system configured to process data of a data set , wherein the data set comprises a plurality of data groups , the computer system comprises at least one processor and memory that are operable to perform the following operations : for a first data set having a plurality of first key-value pairs , wherein such first data set belongs to a first data group and the first key-value pairs have a first schema (more fields) , partitioning the first data set into a plurality of data partitions and providing each data partition to a selected one of a plurality of mapping functions (level organization) that are each user-configurable to independently output a plurality of lists of values for each of a set of keys found in such map function' ;

s corresponding data partition to form a first intermediate data set having a first set of resulting key-value pairs ;

for a second data set having a plurality of second key-value pairs , wherein such second data set belongs to a second data group and the second key-value pairs have a second schema , partitioning the first data set into a plurality of data partitions and providing each data partition to a selected one of a plurality of mapping functions to form a second intermediate data set having a second set (including one) of resulting key-value pairs differing from the first set of resulting key-value pairs , wherein the first schema differs from the second schema ;

and reducing the first and the second intermediate data set together so as to form an output data set , wherein the reducing is accomplished by iterating on at least one key from the first and second intermediate data set , and the output data set has a different schema (more fields) than the first and second schema , wherein the first and second schema and the first and second set of resulting key-value pairs have a key in common .
US20050198043A1
CLAIM 1
. A database in a computer system (computer system) linked to a network and configured to store one or more organizations' ;
data , each organization having one or more sub-organizations , the computer system having one or more processors and one or more storage devices coupled to the processor for storing data , comprising : one or more virtual data islands partitioned inside the database , each virtual data island storing data of an organization ;
each virtual data islands further partitioned into one or more sub-islands , wherein each sub-island storing data for a sub-organization ;
one or more constituent records (CR) in each sub-island , each including one (second set) or more fields (different schema, first schema) with data ;
wherein a sub-organization can share data from selected fields with organizations and other sub-organizations .

US20050198043A1
CLAIM 16
. The database according to claim 15 , wherein standard fields are shared with other sub-organizations and multi-level organization (mapping functions) s .

US8190610B2
CLAIM 41
. The computer system (computer system) of claim 40 , wherein the reducing is accomplished by iterating on at least one key from each of the first and second intermediate data set so that the output data set is a merging of a portion of the first and second intermediate data set .
US20050198043A1
CLAIM 1
. A database in a computer system (computer system) linked to a network and configured to store one or more organizations' ;
data , each organization having one or more sub-organizations , the computer system having one or more processors and one or more storage devices coupled to the processor for storing data , comprising : one or more virtual data islands partitioned inside the database , each virtual data island storing data of an organization ;
each virtual data islands further partitioned into one or more sub-islands , wherein each sub-island storing data for a sub-organization ;
one or more constituent records (CR) in each sub-island , each including one or more fields with data ;
wherein a sub-organization can share data from selected fields with organizations and other sub-organizations .

US8190610B2
CLAIM 42
. The computer system (computer system) of claim 40 , wherein the reducing is accomplished by a plurality of reducers , the d the at least one processor and memory further operable for : partitioning the first or second intermediate data set into a plurality of partitions , at least some of the key-value pairs of the first or second intermediate data being provided to more than one of the partitions ;

and providing the partitioned intermediate data set of each partition to a separate one of the reducers .
US20050198043A1
CLAIM 1
. A database in a computer system (computer system) linked to a network and configured to store one or more organizations' ;
data , each organization having one or more sub-organizations , the computer system having one or more processors and one or more storage devices coupled to the processor for storing data , comprising : one or more virtual data islands partitioned inside the database , each virtual data island storing data of an organization ;
each virtual data islands further partitioned into one or more sub-islands , wherein each sub-island storing data for a sub-organization ;
one or more constituent records (CR) in each sub-island , each including one or more fields with data ;
wherein a sub-organization can share data from selected fields with organizations and other sub-organizations .

US8190610B2
CLAIM 43
. The computer system (computer system) of claim 42 , wherein all of the key-value pairs of the first and second intermediate data set are provided to all of the reducers .
US20050198043A1
CLAIM 1
. A database in a computer system (computer system) linked to a network and configured to store one or more organizations' ;
data , each organization having one or more sub-organizations , the computer system having one or more processors and one or more storage devices coupled to the processor for storing data , comprising : one or more virtual data islands partitioned inside the database , each virtual data island storing data of an organization ;
each virtual data islands further partitioned into one or more sub-islands , wherein each sub-island storing data for a sub-organization ;
one or more constituent records (CR) in each sub-island , each including one or more fields with data ;
wherein a sub-organization can share data from selected fields with organizations and other sub-organizations .

US8190610B2
CLAIM 44
. The computer system (computer system) of claim 42 , wherein at least some of the reducers include a sort , group-by-key and combine task , the d the at least one processor and memory are further operable for : generating and providing metadata for at least some of the mapping , partitioning , combining , grouping and sorting .
US20050198043A1
CLAIM 1
. A database in a computer system (computer system) linked to a network and configured to store one or more organizations' ;
data , each organization having one or more sub-organizations , the computer system having one or more processors and one or more storage devices coupled to the processor for storing data , comprising : one or more virtual data islands partitioned inside the database , each virtual data island storing data of an organization ;
each virtual data islands further partitioned into one or more sub-islands , wherein each sub-island storing data for a sub-organization ;
one or more constituent records (CR) in each sub-island , each including one or more fields with data ;
wherein a sub-organization can share data from selected fields with organizations and other sub-organizations .

US8190610B2
CLAIM 45
. The computer system (computer system) of claim 44 , wherein the reducing includes processing the metadata .
US20050198043A1
CLAIM 1
. A database in a computer system (computer system) linked to a network and configured to store one or more organizations' ;
data , each organization having one or more sub-organizations , the computer system having one or more processors and one or more storage devices coupled to the processor for storing data , comprising : one or more virtual data islands partitioned inside the database , each virtual data island storing data of an organization ;
each virtual data islands further partitioned into one or more sub-islands , wherein each sub-island storing data for a sub-organization ;
one or more constituent records (CR) in each sub-island , each including one or more fields with data ;
wherein a sub-organization can share data from selected fields with organizations and other sub-organizations .

US8190610B2
CLAIM 46
. The computer system (computer system) of claim 45 , wherein iterating includes providing the associated metadata to the processing of the reducing step .
US20050198043A1
CLAIM 1
. A database in a computer system (computer system) linked to a network and configured to store one or more organizations' ;
data , each organization having one or more sub-organizations , the computer system having one or more processors and one or more storage devices coupled to the processor for storing data , comprising : one or more virtual data islands partitioned inside the database , each virtual data island storing data of an organization ;
each virtual data islands further partitioned into one or more sub-islands , wherein each sub-island storing data for a sub-organization ;
one or more constituent records (CR) in each sub-island , each including one or more fields with data ;
wherein a sub-organization can share data from selected fields with organizations and other sub-organizations .




US8190610B2

Filed: 2006-10-05     Issued: 2012-05-29

MapReduce for distributed database processing

(Original Assignee) Yahoo Inc     (Current Assignee) R2 Solutions LLC ; Altaba Inc

Ali Dasdan, Hung-Chih Yang, Ruey-Lung Hsiao
US20030233370A1

Filed: 2003-03-11     Issued: 2003-12-18

Maintaining a relationship between two different items of data

(Original Assignee) Miosoft Corp     (Current Assignee) Miosoft Corp

Albert Barabas, Ernst Siepmann, Mark Gulik
US8190610B2
CLAIM 1
. A method of processing data of a data set over a distributed system , wherein the data set comprises a plurality of data groups , the method comprising : partitioning the data of each one of the data groups into a plurality of data partitions that each have a plurality of key-value pairs and providing each data partition to a selected one of a plurality of mapping functions that are each user-configurable to independently output a plurality of lists of values for each of a set of keys found in such map function' ;

s corresponding data partition to form corresponding intermediate data for that data group and identifiable to that data group , wherein the data of a first data (time t) group has a different schema than the data of a second data group and the data of the first data group is mapped differently than the data of the second data group so that different lists of values are output for the corresponding different intermediate data , wherein the different schema and corresponding different intermediate data have a key in common ;

and reducing the intermediate data for the data groups to at least one output data group , including processing the intermediate data for each data group in a manner that is defined to correspond to that data group , so as to result in a merging of the corresponding different intermediate data based on the key in common , wherein the mapping and reducing operations are performed by a distributed system .
US20030233370A1
CLAIM 19
. The method of claim 16 in which the role object includes a version number that is incremented each time t (first data, first data group) he associated data item is updated .

US8190610B2
CLAIM 14
. The method of claim 13 , wherein : the reducing step is carried out by a plurality of reducers ;

and the data that is not intermediate data , for a particular reducer , includes data (includes data) that is associated with another reducer .
US20030233370A1
CLAIM 2
. The method of claim 1 in which the stored data includes data (includes data) items of the database that comprise objects in an object database .

US8190610B2
CLAIM 15
. The method of claim 13 , wherein : the reducing step is carried out by a plurality of reducers ;

and the data that is not intermediate data , for a particular reducer , includes data (includes data) that is associated with that reducer .
US20030233370A1
CLAIM 2
. The method of claim 1 in which the stored data includes data (includes data) items of the database that comprise objects in an object database .

US8190610B2
CLAIM 17
. A computer system (stored data) including a plurality of computing devices , the computer system configured to process data of a data set , wherein the data set comprises a plurality of data groups , the computer system comprises at least one processor and memory that are operable to perform the following operations : partitioning the data of each one of the data groups into a plurality of data partitions that each have a plurality of key-value pairs and providing each data partition to a selected one of a plurality of mapping functions that are each user-configurable to independently output a plurality of lists of values for each of a set of keys found in such map function' ;

s corresponding data partition to form corresponding intermediate data for that data group and identifiable to that data group , wherein the data of a first data (time t) group has a different schema than the data of a second data group and the data of the first data group is mapped differently than the data of the second data group so that different lists of values are output for the corresponding different intermediate data , wherein the different schema and corresponding different intermediate data have a key in common ;

and reduce the intermediate data for the data groups to at least one output data group , including processing the intermediate data for each data group in a manner that is defined to correspond to that data group so as to result in a merging of the corresponding different intermediate data based on the key in common .
US20030233370A1
CLAIM 2
. The method of claim 1 in which the stored data (computer system) includes data items of the database that comprise objects in an object database .

US20030233370A1
CLAIM 19
. The method of claim 16 in which the role object includes a version number that is incremented each time t (first data, first data group) he associated data item is updated .

US8190610B2
CLAIM 18
. The computer system (stored data) of claim 17 , wherein : the at least one output data group is a plurality of output data groups .
US20030233370A1
CLAIM 2
. The method of claim 1 in which the stored data (computer system) includes data items of the database that comprise objects in an object database .

US8190610B2
CLAIM 19
. The computer system (stored data) of claim 17 , wherein : corresponding intermediate data for a data group being identifiable to the data group includes the corresponding intermediate data being stored such that metadata associated with the manner of storage is identifiable to the data group .
US20030233370A1
CLAIM 2
. The method of claim 1 in which the stored data (computer system) includes data items of the database that comprise objects in an object database .

US8190610B2
CLAIM 20
. The computer system (stored data) of claim 19 , wherein : corresponding intermediate data for a data group being identifiable to the data group includes the computer system being configured to store the corresponding intermediate data in association with data that is identifiable to the data group .
US20030233370A1
CLAIM 2
. The method of claim 1 in which the stored data (computer system) includes data items of the database that comprise objects in an object database .

US8190610B2
CLAIM 21
. The computer system (stored data) of claim 17 , wherein : the at least one processor and memory being configured to process the intermediate data for each data group in a manner that is defined to correspond to that data group includes , for each data group , the at least one processor and memory being configured to employ an iterator that corresponds to that data group .
US20030233370A1
CLAIM 2
. The method of claim 1 in which the stored data (computer system) includes data items of the database that comprise objects in an object database .

US8190610B2
CLAIM 22
. The computer system (stored data) of claim 21 , wherein : the intermediate data includes a plurality of grouped sets of key/value pairs ;

the at least one processor and memory are further operable to reduce the intermediate data via a plurality of reducers ;

for at least one of the reducers , the iterator corresponding to a particular data group , for that reducer , is configured to operate according to a different key of a different schema than the iterator corresponding to another particular data group , for that reducer .
US20030233370A1
CLAIM 2
. The method of claim 1 in which the stored data (computer system) includes data items of the database that comprise objects in an object database .

US8190610B2
CLAIM 23
. The computer system (stored data) of claim 17 , wherein : the intermediate data includes a plurality of grouped sets of key/value pairs ;

the at least one processor and memory are further operable to reduce the intermediate data via a plurality of reducers ;

and the at least one processor and memory are further operable to : partition the intermediate data into a plurality of partitions , to provide each key/value pair of the intermediate data to a separate one of the partitions ;

and provide the intermediate data of each partition to a separate one of the reducers .
US20030233370A1
CLAIM 2
. The method of claim 1 in which the stored data (computer system) includes data items of the database that comprise objects in an object database .

US8190610B2
CLAIM 24
. The computer system (stored data) of claim 17 , wherein : the intermediate data includes a plurality of grouped sets of key/value pairs ;

the at least one processor and memory are further operable to reduce the intermediate data via a plurality of reducers ;

and the at least one processor and memory are further operable to : partition the intermediate data into a plurality of partitions , to provide at least some of the key/value pairs of the intermediate data to more than one of the partitions ;

and provide the intermediate data of each partition to a separate one of the reducers .
US20030233370A1
CLAIM 2
. The method of claim 1 in which the stored data (computer system) includes data items of the database that comprise objects in an object database .

US8190610B2
CLAIM 25
. The computer system (stored data) of claim 24 , wherein : the at least one processor and memory are further operable to , in partitioning , to provide all of the key/value pairs of the intermediate data to all of the partitions .
US20030233370A1
CLAIM 2
. The method of claim 1 in which the stored data (computer system) includes data items of the database that comprise objects in an object database .

US8190610B2
CLAIM 26
. The computer system (stored data) of claim 24 , wherein : at least some of the reducers include a sort , group-by-key and combine task ;

the at least one processor and memory are further operable to generate and provide metadata for at least some of the mapping , partitioning , combining , grouping and sorting .
US20030233370A1
CLAIM 2
. The method of claim 1 in which the stored data (computer system) includes data items of the database that comprise objects in an object database .

US8190610B2
CLAIM 27
. The computer system (stored data) of claim 26 , wherein : the reducing includes processing the metadata .
US20030233370A1
CLAIM 2
. The method of claim 1 in which the stored data (computer system) includes data items of the database that comprise objects in an object database .

US8190610B2
CLAIM 28
. The computer system (stored data) of claim 27 , wherein : the at least one processor and memory being further operable to process the intermediate data for each data group in a manner that is defined to correspond to that data group includes , for each data group , the at least one processor and memory being further operable to employ an iterator that corresponds to that data group , wherein the iterator is configured to provide the associated metadata to the reducing .
US20030233370A1
CLAIM 2
. The method of claim 1 in which the stored data (computer system) includes data items of the database that comprise objects in an object database .

US8190610B2
CLAIM 29
. The computer system (stored data) of claim 17 , wherein : the intermediate data processing of the reducing further comprises processing data that is not intermediate data .
US20030233370A1
CLAIM 2
. The method of claim 1 in which the stored data (computer system) includes data items of the database that comprise objects in an object database .

US8190610B2
CLAIM 30
. The computer system (stored data) of claim 29 , wherein : the at least one processor and memory are further operable to reduce the intermediate data via a plurality of reducers ;

and the data that is not intermediate data , for a particular reducer , includes data (includes data) that is associated with another reducer .
US20030233370A1
CLAIM 2
. The method of claim 1 in which the stored data (computer system) includes data (includes data) items of the database that comprise objects in an object database .

US8190610B2
CLAIM 31
. The computer system (stored data) of claim 29 , wherein : the at least one processor and memory are further operable to reduce the intermediate data via a plurality of reducers ;

and the data that is not intermediate data , for a particular reducer , includes data (includes data) that is associated with that reducer .
US20030233370A1
CLAIM 2
. The method of claim 1 in which the stored data (computer system) includes data (includes data) items of the database that comprise objects in an object database .

US8190610B2
CLAIM 32
. The computer system (stored data) of claim 17 , wherein the at least one processor and memory are further operable to reduce by relating the data among the plurality of data groups .
US20030233370A1
CLAIM 2
. The method of claim 1 in which the stored data (computer system) includes data items of the database that comprise objects in an object database .

US8190610B2
CLAIM 33
. A map-reduce method of processing data from a plurality of groups having different schema over a computer system (stored data) , the method comprising : for a first data (time t) set having a plurality of first key-value pairs , wherein such first data set belongs to a first data group and the first key-value pairs have a first schema , partitioning the first data set into a plurality of data partitions and providing each data partition to a selected one of a plurality of mapping functions that are each user-configurable to independently output a plurality of lists of values for each of a set of keys found in such map function' ;

s corresponding data partition to form a first intermediate data set having a first set of resulting key-value pairs ;

for a second data set having a plurality of second key-value pairs , wherein such second data set belongs to a second data group and the second key-value pairs have a second schema , partitioning the first data set into a plurality of data partitions and providing each data partition to a selected one of a plurality of mapping functions to form a second intermediate data set having a second set of resulting key-value pairs differing from the first set of resulting key-value pairs , wherein the first schema differs from the second schema , wherein the first and second schema and the first and second set of resulting key-value pairs have a key in common ;

and reducing the first and the second intermediate data set together so as to form an output data set , wherein the reducing is accomplished by iterating on at least one key from the first and second intermediate data set , and the output data set has a different schema than the first and second schema , wherein the mapping and reducing operations are performed by a distributed system .
US20030233370A1
CLAIM 2
. The method of claim 1 in which the stored data (computer system) includes data items of the database that comprise objects in an object database .

US20030233370A1
CLAIM 19
. The method of claim 16 in which the role object includes a version number that is incremented each time t (first data, first data group) he associated data item is updated .

US8190610B2
CLAIM 40
. A computer system (stored data) including a plurality of computing devices , the computer system configured to process data of a data set , wherein the data set comprises a plurality of data groups , the computer system comprises at least one processor and memory that are operable to perform the following operations : for a first data (time t) set having a plurality of first key-value pairs , wherein such first data set belongs to a first data group and the first key-value pairs have a first schema , partitioning the first data set into a plurality of data partitions and providing each data partition to a selected one of a plurality of mapping functions that are each user-configurable to independently output a plurality of lists of values for each of a set of keys found in such map function' ;

s corresponding data partition to form a first intermediate data set having a first set of resulting key-value pairs ;

for a second data set having a plurality of second key-value pairs , wherein such second data set belongs to a second data group and the second key-value pairs have a second schema , partitioning the first data set into a plurality of data partitions and providing each data partition to a selected one of a plurality of mapping functions to form a second intermediate data set having a second set of resulting key-value pairs differing from the first set of resulting key-value pairs , wherein the first schema differs from the second schema ;

and reducing the first and the second intermediate data set together so as to form an output data set , wherein the reducing is accomplished by iterating on at least one key from the first and second intermediate data set , and the output data set has a different schema than the first and second schema , wherein the first and second schema and the first and second set of resulting key-value pairs have a key in common .
US20030233370A1
CLAIM 2
. The method of claim 1 in which the stored data (computer system) includes data items of the database that comprise objects in an object database .

US20030233370A1
CLAIM 19
. The method of claim 16 in which the role object includes a version number that is incremented each time t (first data, first data group) he associated data item is updated .

US8190610B2
CLAIM 41
. The computer system (stored data) of claim 40 , wherein the reducing is accomplished by iterating on at least one key from each of the first and second intermediate data set so that the output data set is a merging of a portion of the first and second intermediate data set .
US20030233370A1
CLAIM 2
. The method of claim 1 in which the stored data (computer system) includes data items of the database that comprise objects in an object database .

US8190610B2
CLAIM 42
. The computer system (stored data) of claim 40 , wherein the reducing is accomplished by a plurality of reducers , the d the at least one processor and memory further operable for : partitioning the first or second intermediate data set into a plurality of partitions , at least some of the key-value pairs of the first or second intermediate data being provided to more than one of the partitions ;

and providing the partitioned intermediate data set of each partition to a separate one of the reducers .
US20030233370A1
CLAIM 2
. The method of claim 1 in which the stored data (computer system) includes data items of the database that comprise objects in an object database .

US8190610B2
CLAIM 43
. The computer system (stored data) of claim 42 , wherein all of the key-value pairs of the first and second intermediate data set are provided to all of the reducers .
US20030233370A1
CLAIM 2
. The method of claim 1 in which the stored data (computer system) includes data items of the database that comprise objects in an object database .

US8190610B2
CLAIM 44
. The computer system (stored data) of claim 42 , wherein at least some of the reducers include a sort , group-by-key and combine task , the d the at least one processor and memory are further operable for : generating and providing metadata for at least some of the mapping , partitioning , combining , grouping and sorting .
US20030233370A1
CLAIM 2
. The method of claim 1 in which the stored data (computer system) includes data items of the database that comprise objects in an object database .

US8190610B2
CLAIM 45
. The computer system (stored data) of claim 44 , wherein the reducing includes processing the metadata .
US20030233370A1
CLAIM 2
. The method of claim 1 in which the stored data (computer system) includes data items of the database that comprise objects in an object database .

US8190610B2
CLAIM 46
. The computer system (stored data) of claim 45 , wherein iterating includes providing the associated metadata to the processing of the reducing step .
US20030233370A1
CLAIM 2
. The method of claim 1 in which the stored data (computer system) includes data items of the database that comprise objects in an object database .




US8190610B2

Filed: 2006-10-05     Issued: 2012-05-29

MapReduce for distributed database processing

(Original Assignee) Yahoo Inc     (Current Assignee) R2 Solutions LLC ; Altaba Inc

Ali Dasdan, Hung-Chih Yang, Ruey-Lung Hsiao
US20030105782A1

Filed: 2002-05-20     Issued: 2003-06-05

Partially replicated distributed database with multiple levels of remote clients

(Original Assignee) Brodersen Robert A.; Prashant Chatterjee; Lim Peter S.     

Robert Brodersen, Prashant Chatterjee, Peter Lim
US8190610B2
CLAIM 1
. A method of processing data of a data set over a distributed system , wherein the data set comprises a plurality of data groups , the method comprising : partitioning the data of each one of the data groups into a plurality of data partitions that each have a plurality of key-value pairs and providing each data partition to a selected one of a plurality of mapping functions that are each user-configurable to independently output a plurality of lists of values for each of a set of keys found in such map function' ;

s corresponding data partition to form corresponding intermediate data for that data group and identifiable to that data group , wherein the data of a first data group has a different schema than the data of a second data (second transaction, first transaction) group and the data of the first data group is mapped differently than the data of the second data group so that different lists of values are output for the corresponding different intermediate data , wherein the different schema and corresponding different intermediate data have a key in common ;

and reducing the intermediate data for the data groups to at least one output data group , including processing the intermediate data for each data group in a manner that is defined to correspond to that data group , so as to result in a merging of the corresponding different intermediate data based on the key in common , wherein the mapping and reducing operations are performed by a distributed system .
US20030105782A1
CLAIM 1
. A method of processing transactions at a workgroup server , comprising : receiving first transaction (second data, second intermediate data, second set, intermediate data processing step, second data set, second intermediate data set) information corresponding to a transaction at a local database associated with a workgroup user client ;
receiving second transaction (second data, second intermediate data, second set, intermediate data processing step, second data set, second intermediate data set) information corresponding to a transaction at a master database associated with a master database server ;
updating a workgroup database based on said received first transaction information and said received second transaction information ;
identifying those transactions that did not originate at said master database server ;
and sending transaction information corresponding to said transactions that did not originate at said master database server to said master database server .

US8190610B2
CLAIM 13
. The method of claim 1 , wherein : the intermediate data processing step (second transaction, first transaction) of the reducing step further comprises processing data that is not intermediate data .
US20030105782A1
CLAIM 1
. A method of processing transactions at a workgroup server , comprising : receiving first transaction (second data, second intermediate data, second set, intermediate data processing step, second data set, second intermediate data set) information corresponding to a transaction at a local database associated with a workgroup user client ;
receiving second transaction (second data, second intermediate data, second set, intermediate data processing step, second data set, second intermediate data set) information corresponding to a transaction at a master database associated with a master database server ;
updating a workgroup database based on said received first transaction information and said received second transaction information ;
identifying those transactions that did not originate at said master database server ;
and sending transaction information corresponding to said transactions that did not originate at said master database server to said master database server .

US8190610B2
CLAIM 17
. A computer system including a plurality of computing devices , the computer system configured to process data of a data set , wherein the data set comprises a plurality of data groups , the computer system comprises at least one processor and memory that are operable to perform the following operations : partitioning the data of each one of the data groups into a plurality of data partitions that each have a plurality of key-value pairs and providing each data partition to a selected one of a plurality of mapping functions that are each user-configurable to independently output a plurality of lists of values for each of a set of keys found in such map function' ;

s corresponding data partition to form corresponding intermediate data for that data group and identifiable to that data group , wherein the data of a first data group has a different schema than the data of a second data (second transaction, first transaction) group and the data of the first data group is mapped differently than the data of the second data group so that different lists of values are output for the corresponding different intermediate data , wherein the different schema and corresponding different intermediate data have a key in common ;

and reduce the intermediate data for the data groups to at least one output data group , including processing the intermediate data for each data group in a manner that is defined to correspond to that data group so as to result in a merging of the corresponding different intermediate data based on the key in common .
US20030105782A1
CLAIM 1
. A method of processing transactions at a workgroup server , comprising : receiving first transaction (second data, second intermediate data, second set, intermediate data processing step, second data set, second intermediate data set) information corresponding to a transaction at a local database associated with a workgroup user client ;
receiving second transaction (second data, second intermediate data, second set, intermediate data processing step, second data set, second intermediate data set) information corresponding to a transaction at a master database associated with a master database server ;
updating a workgroup database based on said received first transaction information and said received second transaction information ;
identifying those transactions that did not originate at said master database server ;
and sending transaction information corresponding to said transactions that did not originate at said master database server to said master database server .

US8190610B2
CLAIM 33
. A map-reduce method of processing data from a plurality of groups having different schema over a computer system , the method comprising : for a first data set having a plurality of first key-value pairs , wherein such first data set belongs to a first data group and the first key-value pairs have a first schema , partitioning the first data set into a plurality of data partitions and providing each data partition to a selected one of a plurality of mapping functions that are each user-configurable to independently output a plurality of lists of values for each of a set of keys found in such map function' ;

s corresponding data partition to form a first intermediate data set having a first set of resulting key-value pairs ;

for a second data (second transaction, first transaction) set having a plurality of second key-value pairs , wherein such second data set belongs to a second data group and the second key-value pairs have a second schema , partitioning the first data set into a plurality of data partitions and providing each data partition to a selected one of a plurality of mapping functions to form a second intermediate data (second transaction, first transaction) set having a second set (second transaction, first transaction) of resulting key-value pairs differing from the first set of resulting key-value pairs , wherein the first schema differs from the second schema , wherein the first and second schema and the first and second set of resulting key-value pairs have a key in common ;

and reducing the first and the second intermediate data set together so as to form an output data set , wherein the reducing is accomplished by iterating on at least one key from the first and second intermediate data set , and the output data set has a different schema than the first and second schema , wherein the mapping and reducing operations are performed by a distributed system .
US20030105782A1
CLAIM 1
. A method of processing transactions at a workgroup server , comprising : receiving first transaction (second data, second intermediate data, second set, intermediate data processing step, second data set, second intermediate data set) information corresponding to a transaction at a local database associated with a workgroup user client ;
receiving second transaction (second data, second intermediate data, second set, intermediate data processing step, second data set, second intermediate data set) information corresponding to a transaction at a master database associated with a master database server ;
updating a workgroup database based on said received first transaction information and said received second transaction information ;
identifying those transactions that did not originate at said master database server ;
and sending transaction information corresponding to said transactions that did not originate at said master database server to said master database server .

US8190610B2
CLAIM 34
. The map-reduce method of claim 33 , wherein the reducing is accomplished by iterating on at least one key from each of the first and second intermediate data (second transaction, first transaction) set so that the output data set is a merging of a portion of the first and second intermediate data set .
US20030105782A1
CLAIM 1
. A method of processing transactions at a workgroup server , comprising : receiving first transaction (second data, second intermediate data, second set, intermediate data processing step, second data set, second intermediate data set) information corresponding to a transaction at a local database associated with a workgroup user client ;
receiving second transaction (second data, second intermediate data, second set, intermediate data processing step, second data set, second intermediate data set) information corresponding to a transaction at a master database associated with a master database server ;
updating a workgroup database based on said received first transaction information and said received second transaction information ;
identifying those transactions that did not originate at said master database server ;
and sending transaction information corresponding to said transactions that did not originate at said master database server to said master database server .

US8190610B2
CLAIM 35
. The map-reduce method of claim 33 , wherein the reducing is accomplished by a plurality of reducers , the method further comprises : partitioning the first or second intermediate data (second transaction, first transaction) set into a plurality of partitions , at least some of the key-value pairs of the first or second intermediate data being provided to more than one of the partitions ;

and providing the partitioned intermediate data set of each partition to a separate one of the reducers .
US20030105782A1
CLAIM 1
. A method of processing transactions at a workgroup server , comprising : receiving first transaction (second data, second intermediate data, second set, intermediate data processing step, second data set, second intermediate data set) information corresponding to a transaction at a local database associated with a workgroup user client ;
receiving second transaction (second data, second intermediate data, second set, intermediate data processing step, second data set, second intermediate data set) information corresponding to a transaction at a master database associated with a master database server ;
updating a workgroup database based on said received first transaction information and said received second transaction information ;
identifying those transactions that did not originate at said master database server ;
and sending transaction information corresponding to said transactions that did not originate at said master database server to said master database server .

US8190610B2
CLAIM 36
. The map-reduce method of claim 35 , wherein all of the key-value pairs of the first and second intermediate data (second transaction, first transaction) set are provided to all of the reducers .
US20030105782A1
CLAIM 1
. A method of processing transactions at a workgroup server , comprising : receiving first transaction (second data, second intermediate data, second set, intermediate data processing step, second data set, second intermediate data set) information corresponding to a transaction at a local database associated with a workgroup user client ;
receiving second transaction (second data, second intermediate data, second set, intermediate data processing step, second data set, second intermediate data set) information corresponding to a transaction at a master database associated with a master database server ;
updating a workgroup database based on said received first transaction information and said received second transaction information ;
identifying those transactions that did not originate at said master database server ;
and sending transaction information corresponding to said transactions that did not originate at said master database server to said master database server .

US8190610B2
CLAIM 40
. A computer system including a plurality of computing devices , the computer system configured to process data of a data set , wherein the data set comprises a plurality of data groups , the computer system comprises at least one processor and memory that are operable to perform the following operations : for a first data set having a plurality of first key-value pairs , wherein such first data set belongs to a first data group and the first key-value pairs have a first schema , partitioning the first data set into a plurality of data partitions and providing each data partition to a selected one of a plurality of mapping functions that are each user-configurable to independently output a plurality of lists of values for each of a set of keys found in such map function' ;

s corresponding data partition to form a first intermediate data set having a first set of resulting key-value pairs ;

for a second data (second transaction, first transaction) set having a plurality of second key-value pairs , wherein such second data set belongs to a second data group and the second key-value pairs have a second schema , partitioning the first data set into a plurality of data partitions and providing each data partition to a selected one of a plurality of mapping functions to form a second intermediate data (second transaction, first transaction) set having a second set (second transaction, first transaction) of resulting key-value pairs differing from the first set of resulting key-value pairs , wherein the first schema differs from the second schema ;

and reducing the first and the second intermediate data set together so as to form an output data set , wherein the reducing is accomplished by iterating on at least one key from the first and second intermediate data set , and the output data set has a different schema than the first and second schema , wherein the first and second schema and the first and second set of resulting key-value pairs have a key in common .
US20030105782A1
CLAIM 1
. A method of processing transactions at a workgroup server , comprising : receiving first transaction (second data, second intermediate data, second set, intermediate data processing step, second data set, second intermediate data set) information corresponding to a transaction at a local database associated with a workgroup user client ;
receiving second transaction (second data, second intermediate data, second set, intermediate data processing step, second data set, second intermediate data set) information corresponding to a transaction at a master database associated with a master database server ;
updating a workgroup database based on said received first transaction information and said received second transaction information ;
identifying those transactions that did not originate at said master database server ;
and sending transaction information corresponding to said transactions that did not originate at said master database server to said master database server .

US8190610B2
CLAIM 41
. The computer system of claim 40 , wherein the reducing is accomplished by iterating on at least one key from each of the first and second intermediate data (second transaction, first transaction) set so that the output data set is a merging of a portion of the first and second intermediate data set .
US20030105782A1
CLAIM 1
. A method of processing transactions at a workgroup server , comprising : receiving first transaction (second data, second intermediate data, second set, intermediate data processing step, second data set, second intermediate data set) information corresponding to a transaction at a local database associated with a workgroup user client ;
receiving second transaction (second data, second intermediate data, second set, intermediate data processing step, second data set, second intermediate data set) information corresponding to a transaction at a master database associated with a master database server ;
updating a workgroup database based on said received first transaction information and said received second transaction information ;
identifying those transactions that did not originate at said master database server ;
and sending transaction information corresponding to said transactions that did not originate at said master database server to said master database server .

US8190610B2
CLAIM 42
. The computer system of claim 40 , wherein the reducing is accomplished by a plurality of reducers , the d the at least one processor and memory further operable for : partitioning the first or second intermediate data (second transaction, first transaction) set into a plurality of partitions , at least some of the key-value pairs of the first or second intermediate data being provided to more than one of the partitions ;

and providing the partitioned intermediate data set of each partition to a separate one of the reducers .
US20030105782A1
CLAIM 1
. A method of processing transactions at a workgroup server , comprising : receiving first transaction (second data, second intermediate data, second set, intermediate data processing step, second data set, second intermediate data set) information corresponding to a transaction at a local database associated with a workgroup user client ;
receiving second transaction (second data, second intermediate data, second set, intermediate data processing step, second data set, second intermediate data set) information corresponding to a transaction at a master database associated with a master database server ;
updating a workgroup database based on said received first transaction information and said received second transaction information ;
identifying those transactions that did not originate at said master database server ;
and sending transaction information corresponding to said transactions that did not originate at said master database server to said master database server .

US8190610B2
CLAIM 43
. The computer system of claim 42 , wherein all of the key-value pairs of the first and second intermediate data (second transaction, first transaction) set are provided to all of the reducers .
US20030105782A1
CLAIM 1
. A method of processing transactions at a workgroup server , comprising : receiving first transaction (second data, second intermediate data, second set, intermediate data processing step, second data set, second intermediate data set) information corresponding to a transaction at a local database associated with a workgroup user client ;
receiving second transaction (second data, second intermediate data, second set, intermediate data processing step, second data set, second intermediate data set) information corresponding to a transaction at a master database associated with a master database server ;
updating a workgroup database based on said received first transaction information and said received second transaction information ;
identifying those transactions that did not originate at said master database server ;
and sending transaction information corresponding to said transactions that did not originate at said master database server to said master database server .




US8190610B2

Filed: 2006-10-05     Issued: 2012-05-29

MapReduce for distributed database processing

(Original Assignee) Yahoo Inc     (Current Assignee) R2 Solutions LLC ; Altaba Inc

Ali Dasdan, Hung-Chih Yang, Ruey-Lung Hsiao
US7103590B1

Filed: 2001-08-24     Issued: 2006-09-05

Method and system for pipelined database table functions

(Original Assignee) Oracle International Corp     (Current Assignee) Oracle International Corp

Ravi Murthy, Ajay Sethi, Bhaskar Ghosh, Ashish Thusoo, Shashaanka Agrawal, Adiel M. Yoaz
US8190610B2
CLAIM 1
. A method of processing data of a data set over a distributed system , wherein the data set comprises a plurality of data groups , the method comprising : partitioning the data of each one of the data groups into a plurality of data partitions (based partitioning) that each have a plurality of key-value pairs and providing each data partition to a selected one of a plurality of mapping functions that are each user-configurable to independently output a plurality of lists of values for each of a set of keys found in such map function' ;

s corresponding data partition to form corresponding intermediate data for that data group and identifiable to that data group , wherein the data of a first data (repeating step, steps a) group has a different schema (different execution, compile time, data object) than the data of a second data group (repeating step, steps a) and the data of the first data group is mapped differently than the data of the second data group so that different lists of values are output for the corresponding different intermediate data , wherein the different schema and corresponding different intermediate data have a key in common ;

and reducing the intermediate data for the data groups to at least one output data group , including processing the intermediate data for each data group in a manner that is defined to correspond to that data group , so as to result in a merging of the corresponding different intermediate data based on the key in common , wherein the mapping and reducing operations are performed by a distributed system .
US7103590B1
CLAIM 1
. A method for pipelining a table function in a database system , comprising : a) performing a set up operation when the table function is called , the table function being a user-defined function that produces rows of data and used in selection , iteration , or aggregation database query language statements ;
b) fetching a subset of output data from a data producer ;
c) sending the subset of the output data to a first consumer of the output data , wherein the first consumer is the table function ;
d) repeating step (first data, first data group, second data group) s b) and c) until all the output data has been fetched from the data producer .

US7103590B1
CLAIM 4
. The method of claim 1 in which the subset of the output data comprises a single data object (different schema) or row of data .

US7103590B1
CLAIM 9
. The method of claim 1 in which the table function executes in a different execution (different schema) thread than the data producer .

US7103590B1
CLAIM 14
. The method of claim 13 in which the dynamically configurable return type is established at compile time (different schema) .

US7103590B1
CLAIM 15
. The method of claim 1 in which steps a (first data, first data group, second data group) ) through d) are implemented within a database query language statement .

US7103590B1
CLAIM 23
. The method of claim 22 in which the partitioning definition comprises either hash or range based partitioning (data partitions, partitioning step, combine task) .

US8190610B2
CLAIM 8
. The method of claim 7 , wherein : in the partitioning step (based partitioning) , all of the key/value pairs of the intermediate data are provided to all of the partitions .
US7103590B1
CLAIM 23
. The method of claim 22 in which the partitioning definition comprises either hash or range based partitioning (data partitions, partitioning step, combine task) .

US8190610B2
CLAIM 9
. The method of claim 7 , wherein : at least some of the reducers include a sort , group-by-key and combine task (based partitioning) ;

the method further comprises generating and providing metadata for at least some of the mapping , partitioning , combining , grouping and sorting .
US7103590B1
CLAIM 23
. The method of claim 22 in which the partitioning definition comprises either hash or range based partitioning (data partitions, partitioning step, combine task) .

US8190610B2
CLAIM 12
. The method of claim 5 , wherein : the intermediate data includes a plurality of grouped sets of key/value pairs ;

the reducing step is carried out by a plurality of reducers ;

for at least one of the reducers , the iterator corresponding to a particular data group , for that reducer , operates according to a different key of a different schema (different execution, compile time, data object) than the iterator corresponding to another particular data group , for that reducer .
US7103590B1
CLAIM 4
. The method of claim 1 in which the subset of the output data comprises a single data object (different schema) or row of data .

US7103590B1
CLAIM 9
. The method of claim 1 in which the table function executes in a different execution (different schema) thread than the data producer .

US7103590B1
CLAIM 14
. The method of claim 13 in which the dynamically configurable return type is established at compile time (different schema) .

US8190610B2
CLAIM 17
. A computer system including a plurality of computing devices , the computer system configured to process data of a data set , wherein the data set comprises a plurality of data groups , the computer system comprises at least one processor and memory that are operable to perform the following operations : partitioning the data of each one of the data groups into a plurality of data partitions (based partitioning) that each have a plurality of key-value pairs and providing each data partition to a selected one of a plurality of mapping functions that are each user-configurable to independently output a plurality of lists of values for each of a set of keys found in such map function' ;

s corresponding data partition to form corresponding intermediate data for that data group and identifiable to that data group , wherein the data of a first data (repeating step, steps a) group has a different schema (different execution, compile time, data object) than the data of a second data group (repeating step, steps a) and the data of the first data group is mapped differently than the data of the second data group so that different lists of values are output for the corresponding different intermediate data , wherein the different schema and corresponding different intermediate data have a key in common ;

and reduce the intermediate data for the data groups to at least one output data group , including processing the intermediate data for each data group in a manner that is defined to correspond to that data group so as to result in a merging of the corresponding different intermediate data based on the key in common .
US7103590B1
CLAIM 1
. A method for pipelining a table function in a database system , comprising : a) performing a set up operation when the table function is called , the table function being a user-defined function that produces rows of data and used in selection , iteration , or aggregation database query language statements ;
b) fetching a subset of output data from a data producer ;
c) sending the subset of the output data to a first consumer of the output data , wherein the first consumer is the table function ;
d) repeating step (first data, first data group, second data group) s b) and c) until all the output data has been fetched from the data producer .

US7103590B1
CLAIM 4
. The method of claim 1 in which the subset of the output data comprises a single data object (different schema) or row of data .

US7103590B1
CLAIM 9
. The method of claim 1 in which the table function executes in a different execution (different schema) thread than the data producer .

US7103590B1
CLAIM 14
. The method of claim 13 in which the dynamically configurable return type is established at compile time (different schema) .

US7103590B1
CLAIM 15
. The method of claim 1 in which steps a (first data, first data group, second data group) ) through d) are implemented within a database query language statement .

US7103590B1
CLAIM 23
. The method of claim 22 in which the partitioning definition comprises either hash or range based partitioning (data partitions, partitioning step, combine task) .

US8190610B2
CLAIM 22
. The computer system of claim 21 , wherein : the intermediate data includes a plurality of grouped sets of key/value pairs ;

the at least one processor and memory are further operable to reduce the intermediate data via a plurality of reducers ;

for at least one of the reducers , the iterator corresponding to a particular data group , for that reducer , is configured to operate according to a different key of a different schema (different execution, compile time, data object) than the iterator corresponding to another particular data group , for that reducer .
US7103590B1
CLAIM 4
. The method of claim 1 in which the subset of the output data comprises a single data object (different schema) or row of data .

US7103590B1
CLAIM 9
. The method of claim 1 in which the table function executes in a different execution (different schema) thread than the data producer .

US7103590B1
CLAIM 14
. The method of claim 13 in which the dynamically configurable return type is established at compile time (different schema) .

US8190610B2
CLAIM 26
. The computer system of claim 24 , wherein : at least some of the reducers include a sort , group-by-key and combine task (based partitioning) ;

the at least one processor and memory are further operable to generate and provide metadata for at least some of the mapping , partitioning , combining , grouping and sorting .
US7103590B1
CLAIM 23
. The method of claim 22 in which the partitioning definition comprises either hash or range based partitioning (data partitions, partitioning step, combine task) .

US8190610B2
CLAIM 33
. A map-reduce method of processing data from a plurality of groups having different schema (different execution, compile time, data object) over a computer system , the method comprising : for a first data (repeating step, steps a) set having a plurality of first key-value pairs , wherein such first data set belongs to a first data group and the first key-value pairs have a first schema , partitioning the first data set into a plurality of data partitions (based partitioning) and providing each data partition to a selected one of a plurality of mapping functions that are each user-configurable to independently output a plurality of lists of values for each of a set of keys found in such map function' ;

s corresponding data partition to form a first intermediate data set having a first set of resulting key-value pairs ;

for a second data set having a plurality of second key-value pairs , wherein such second data set belongs to a second data group (repeating step, steps a) and the second key-value pairs have a second schema , partitioning the first data set into a plurality of data partitions and providing each data partition to a selected one of a plurality of mapping functions to form a second intermediate data set having a second set of resulting key-value pairs differing from the first set of resulting key-value pairs , wherein the first schema differs from the second schema , wherein the first and second schema and the first and second set of resulting key-value pairs have a key in common ;

and reducing the first and the second intermediate data set together so as to form an output data set , wherein the reducing is accomplished by iterating on at least one key from the first and second intermediate data set , and the output data set has a different schema than the first and second schema , wherein the mapping and reducing operations are performed by a distributed system .
US7103590B1
CLAIM 1
. A method for pipelining a table function in a database system , comprising : a) performing a set up operation when the table function is called , the table function being a user-defined function that produces rows of data and used in selection , iteration , or aggregation database query language statements ;
b) fetching a subset of output data from a data producer ;
c) sending the subset of the output data to a first consumer of the output data , wherein the first consumer is the table function ;
d) repeating step (first data, first data group, second data group) s b) and c) until all the output data has been fetched from the data producer .

US7103590B1
CLAIM 4
. The method of claim 1 in which the subset of the output data comprises a single data object (different schema) or row of data .

US7103590B1
CLAIM 9
. The method of claim 1 in which the table function executes in a different execution (different schema) thread than the data producer .

US7103590B1
CLAIM 14
. The method of claim 13 in which the dynamically configurable return type is established at compile time (different schema) .

US7103590B1
CLAIM 15
. The method of claim 1 in which steps a (first data, first data group, second data group) ) through d) are implemented within a database query language statement .

US7103590B1
CLAIM 23
. The method of claim 22 in which the partitioning definition comprises either hash or range based partitioning (data partitions, partitioning step, combine task) .

US8190610B2
CLAIM 37
. The map-reduce method of claim 35 , wherein at least some of the reducers include a sort , group-by-key and combine task (based partitioning) , the method further comprises generating and providing metadata for at least some of the mapping , partitioning , combining , grouping and sorting .
US7103590B1
CLAIM 23
. The method of claim 22 in which the partitioning definition comprises either hash or range based partitioning (data partitions, partitioning step, combine task) .

US8190610B2
CLAIM 40
. A computer system including a plurality of computing devices , the computer system configured to process data of a data set , wherein the data set comprises a plurality of data groups , the computer system comprises at least one processor and memory that are operable to perform the following operations : for a first data (repeating step, steps a) set having a plurality of first key-value pairs , wherein such first data set belongs to a first data group and the first key-value pairs have a first schema , partitioning the first data set into a plurality of data partitions (based partitioning) and providing each data partition to a selected one of a plurality of mapping functions that are each user-configurable to independently output a plurality of lists of values for each of a set of keys found in such map function' ;

s corresponding data partition to form a first intermediate data set having a first set of resulting key-value pairs ;

for a second data set having a plurality of second key-value pairs , wherein such second data set belongs to a second data group (repeating step, steps a) and the second key-value pairs have a second schema , partitioning the first data set into a plurality of data partitions and providing each data partition to a selected one of a plurality of mapping functions to form a second intermediate data set having a second set of resulting key-value pairs differing from the first set of resulting key-value pairs , wherein the first schema differs from the second schema ;

and reducing the first and the second intermediate data set together so as to form an output data set , wherein the reducing is accomplished by iterating on at least one key from the first and second intermediate data set , and the output data set has a different schema (different execution, compile time, data object) than the first and second schema , wherein the first and second schema and the first and second set of resulting key-value pairs have a key in common .
US7103590B1
CLAIM 1
. A method for pipelining a table function in a database system , comprising : a) performing a set up operation when the table function is called , the table function being a user-defined function that produces rows of data and used in selection , iteration , or aggregation database query language statements ;
b) fetching a subset of output data from a data producer ;
c) sending the subset of the output data to a first consumer of the output data , wherein the first consumer is the table function ;
d) repeating step (first data, first data group, second data group) s b) and c) until all the output data has been fetched from the data producer .

US7103590B1
CLAIM 4
. The method of claim 1 in which the subset of the output data comprises a single data object (different schema) or row of data .

US7103590B1
CLAIM 9
. The method of claim 1 in which the table function executes in a different execution (different schema) thread than the data producer .

US7103590B1
CLAIM 14
. The method of claim 13 in which the dynamically configurable return type is established at compile time (different schema) .

US7103590B1
CLAIM 15
. The method of claim 1 in which steps a (first data, first data group, second data group) ) through d) are implemented within a database query language statement .

US7103590B1
CLAIM 23
. The method of claim 22 in which the partitioning definition comprises either hash or range based partitioning (data partitions, partitioning step, combine task) .

US8190610B2
CLAIM 44
. The computer system of claim 42 , wherein at least some of the reducers include a sort , group-by-key and combine task (based partitioning) , the d the at least one processor and memory are further operable for : generating and providing metadata for at least some of the mapping , partitioning , combining , grouping and sorting .
US7103590B1
CLAIM 23
. The method of claim 22 in which the partitioning definition comprises either hash or range based partitioning (data partitions, partitioning step, combine task) .




US8190610B2

Filed: 2006-10-05     Issued: 2012-05-29

MapReduce for distributed database processing

(Original Assignee) Yahoo Inc     (Current Assignee) R2 Solutions LLC ; Altaba Inc

Ali Dasdan, Hung-Chih Yang, Ruey-Lung Hsiao
US20030055822A1

Filed: 2001-07-17     Issued: 2003-03-20

Database systems, methods and computer program products including primary key and super key indexes for use with partitioned tables

(Original Assignee) Trendium Inc     (Current Assignee) Viavi Solutions Inc

Lin Yu
US8190610B2
CLAIM 1
. A method of processing data of a data set over a distributed system , wherein the data set comprises a plurality of data groups , the method comprising : partitioning the data of each one of the data groups into a plurality of data partitions (second partition, first partition) that each have a plurality of key-value pairs and providing each data partition to a selected one of a plurality of mapping functions that are each user-configurable to independently output a plurality of lists of values for each of a set of keys found in such map function' ;

s corresponding data partition to form corresponding intermediate data for that data group and identifiable to that data group , wherein the data of a first data group has a different schema than the data of a second data (third portion, second data) group and the data of the first data group is mapped differently than the data of the second data group so that different lists (respective entity) of values are output for the corresponding different intermediate data , wherein the different schema and corresponding different intermediate data have a key in common ;

and reducing the intermediate data for the data groups to at least one output data group , including processing the intermediate data for each data group in a manner that is defined to correspond to that data group , so as to result in a merging of the corresponding different intermediate data based on the key in common , wherein the mapping and reducing operations are performed by a distributed system .
US20030055822A1
CLAIM 6
. The relational database system of claim 5 , wherein the partitioned database table comprises a first partition (partitioning step, data partitions) ed database table , the primary key comprises a first primary key and the super key comprises a first super key , the relational database further comprising : a second partition (partitioning step, data partitions) ed database table ;
a second primary key of the second data (second data, second data group, second data set) base table , the second primary key comprising one or more columns of the second database table other than columns of the second database table used only for partitioning the second database table ;
a second super key of the second database table , the second super key comprising the columns of the second primary key and one or more columns of the database table which are used only for partitioning the database table ;
a super key index of the second partitioned database table based on the second super key ;
a third partitioned database table comprising a relationship table which relates entries in the first partitioned database table to entries in the second partitioned database table ;
a third primary key comprising the first primary key , the second primary key and one or more columns of the third partitioned database table used for partitioning the third partitioned database table ;
and an index of the third partitioned database table based on the one or more columns of the third portion (second data, second data group, second data set) ed database table used for partitioning the third database table .

US20030055822A1
CLAIM 11
. A method of maintaining referential integrity between partitioned tables of a relational database , the method comprising : defining primary keys of at least two entity tables of the relational database so as to only include columns of the at least two entity tables other than columns used only for partitioning the respective entity (different lists, value pairs) tables of the at least two entity tables ;
defining super keys of the at least two entity tables of the relational database so as to include the respective primary keys and at least one column used only for partitioning the respective entity tables ;
defining super key indices for the at least two entity tables based on their respective super keys ;
defining a primary key of a relationship table associated with the at least two entity tables based on the primary keys of the at least two entity tables and at least one column of the relationship table used for partitioning the relationship table ;
and defining an index of the relationship table based on the at least one column of the relationship table used for partitioning the relationship table .

US8190610B2
CLAIM 8
. The method of claim 7 , wherein : in the partitioning step (second partition, first partition) , all of the key/value pairs of the intermediate data are provided to all of the partitions .
US20030055822A1
CLAIM 6
. The relational database system of claim 5 , wherein the partitioned database table comprises a first partition (partitioning step, data partitions) ed database table , the primary key comprises a first primary key and the super key comprises a first super key , the relational database further comprising : a second partition (partitioning step, data partitions) ed database table ;
a second primary key of the second database table , the second primary key comprising one or more columns of the second database table other than columns of the second database table used only for partitioning the second database table ;
a second super key of the second database table , the second super key comprising the columns of the second primary key and one or more columns of the database table which are used only for partitioning the database table ;
a super key index of the second partitioned database table based on the second super key ;
a third partitioned database table comprising a relationship table which relates entries in the first partitioned database table to entries in the second partitioned database table ;
a third primary key comprising the first primary key , the second primary key and one or more columns of the third partitioned database table used for partitioning the third partitioned database table ;
and an index of the third partitioned database table based on the one or more columns of the third portioned database table used for partitioning the third database table .

US8190610B2
CLAIM 17
. A computer system including a plurality of computing devices , the computer system configured to process data of a data set , wherein the data set comprises a plurality of data groups , the computer system comprises at least one processor and memory that are operable to perform the following operations : partitioning the data of each one of the data groups into a plurality of data partitions (second partition, first partition) that each have a plurality of key-value pairs and providing each data partition to a selected one of a plurality of mapping functions that are each user-configurable to independently output a plurality of lists of values for each of a set of keys found in such map function' ;

s corresponding data partition to form corresponding intermediate data for that data group and identifiable to that data group , wherein the data of a first data group has a different schema than the data of a second data (third portion, second data) group and the data of the first data group is mapped differently than the data of the second data group so that different lists (respective entity) of values are output for the corresponding different intermediate data , wherein the different schema and corresponding different intermediate data have a key in common ;

and reduce the intermediate data for the data groups to at least one output data group , including processing the intermediate data for each data group in a manner that is defined to correspond to that data group so as to result in a merging of the corresponding different intermediate data based on the key in common .
US20030055822A1
CLAIM 6
. The relational database system of claim 5 , wherein the partitioned database table comprises a first partition (partitioning step, data partitions) ed database table , the primary key comprises a first primary key and the super key comprises a first super key , the relational database further comprising : a second partition (partitioning step, data partitions) ed database table ;
a second primary key of the second data (second data, second data group, second data set) base table , the second primary key comprising one or more columns of the second database table other than columns of the second database table used only for partitioning the second database table ;
a second super key of the second database table , the second super key comprising the columns of the second primary key and one or more columns of the database table which are used only for partitioning the database table ;
a super key index of the second partitioned database table based on the second super key ;
a third partitioned database table comprising a relationship table which relates entries in the first partitioned database table to entries in the second partitioned database table ;
a third primary key comprising the first primary key , the second primary key and one or more columns of the third partitioned database table used for partitioning the third partitioned database table ;
and an index of the third partitioned database table based on the one or more columns of the third portion (second data, second data group, second data set) ed database table used for partitioning the third database table .

US20030055822A1
CLAIM 11
. A method of maintaining referential integrity between partitioned tables of a relational database , the method comprising : defining primary keys of at least two entity tables of the relational database so as to only include columns of the at least two entity tables other than columns used only for partitioning the respective entity (different lists, value pairs) tables of the at least two entity tables ;
defining super keys of the at least two entity tables of the relational database so as to include the respective primary keys and at least one column used only for partitioning the respective entity tables ;
defining super key indices for the at least two entity tables based on their respective super keys ;
defining a primary key of a relationship table associated with the at least two entity tables based on the primary keys of the at least two entity tables and at least one column of the relationship table used for partitioning the relationship table ;
and defining an index of the relationship table based on the at least one column of the relationship table used for partitioning the relationship table .

US8190610B2
CLAIM 33
. A map-reduce method of processing data from a plurality of groups having different schema over a computer system , the method comprising : for a first data set having a plurality of first key-value pairs , wherein such first data set belongs to a first data group and the first key-value pairs have a first schema , partitioning the first data set into a plurality of data partitions (second partition, first partition) and providing each data partition to a selected one of a plurality of mapping functions that are each user-configurable to independently output a plurality of lists of values for each of a set of keys found in such map function' ;

s corresponding data partition to form a first intermediate data set having a first set of resulting key-value pairs ;

for a second data (third portion, second data) set having a plurality of second key-value pairs , wherein such second data set belongs to a second data group and the second key-value pairs have a second schema , partitioning the first data set into a plurality of data partitions and providing each data partition to a selected one of a plurality of mapping functions to form a second intermediate data set having a second set of resulting key-value pairs differing from the first set of resulting key-value pairs , wherein the first schema differs from the second schema , wherein the first and second schema and the first and second set of resulting key-value pairs have a key in common ;

and reducing the first and the second intermediate data set together so as to form an output data set , wherein the reducing is accomplished by iterating on at least one key from the first and second intermediate data set , and the output data set has a different schema than the first and second schema , wherein the mapping and reducing operations are performed by a distributed system .
US20030055822A1
CLAIM 6
. The relational database system of claim 5 , wherein the partitioned database table comprises a first partition (partitioning step, data partitions) ed database table , the primary key comprises a first primary key and the super key comprises a first super key , the relational database further comprising : a second partition (partitioning step, data partitions) ed database table ;
a second primary key of the second data (second data, second data group, second data set) base table , the second primary key comprising one or more columns of the second database table other than columns of the second database table used only for partitioning the second database table ;
a second super key of the second database table , the second super key comprising the columns of the second primary key and one or more columns of the database table which are used only for partitioning the database table ;
a super key index of the second partitioned database table based on the second super key ;
a third partitioned database table comprising a relationship table which relates entries in the first partitioned database table to entries in the second partitioned database table ;
a third primary key comprising the first primary key , the second primary key and one or more columns of the third partitioned database table used for partitioning the third partitioned database table ;
and an index of the third partitioned database table based on the one or more columns of the third portion (second data, second data group, second data set) ed database table used for partitioning the third database table .

US8190610B2
CLAIM 40
. A computer system including a plurality of computing devices , the computer system configured to process data of a data set , wherein the data set comprises a plurality of data groups , the computer system comprises at least one processor and memory that are operable to perform the following operations : for a first data set having a plurality of first key-value pairs , wherein such first data set belongs to a first data group and the first key-value pairs have a first schema , partitioning the first data set into a plurality of data partitions (second partition, first partition) and providing each data partition to a selected one of a plurality of mapping functions that are each user-configurable to independently output a plurality of lists of values for each of a set of keys found in such map function' ;

s corresponding data partition to form a first intermediate data set having a first set of resulting key-value pairs ;

for a second data (third portion, second data) set having a plurality of second key-value pairs , wherein such second data set belongs to a second data group and the second key-value pairs have a second schema , partitioning the first data set into a plurality of data partitions and providing each data partition to a selected one of a plurality of mapping functions to form a second intermediate data set having a second set of resulting key-value pairs differing from the first set of resulting key-value pairs , wherein the first schema differs from the second schema ;

and reducing the first and the second intermediate data set together so as to form an output data set , wherein the reducing is accomplished by iterating on at least one key from the first and second intermediate data set , and the output data set has a different schema than the first and second schema , wherein the first and second schema and the first and second set of resulting key-value pairs have a key in common .
US20030055822A1
CLAIM 6
. The relational database system of claim 5 , wherein the partitioned database table comprises a first partition (partitioning step, data partitions) ed database table , the primary key comprises a first primary key and the super key comprises a first super key , the relational database further comprising : a second partition (partitioning step, data partitions) ed database table ;
a second primary key of the second data (second data, second data group, second data set) base table , the second primary key comprising one or more columns of the second database table other than columns of the second database table used only for partitioning the second database table ;
a second super key of the second database table , the second super key comprising the columns of the second primary key and one or more columns of the database table which are used only for partitioning the database table ;
a super key index of the second partitioned database table based on the second super key ;
a third partitioned database table comprising a relationship table which relates entries in the first partitioned database table to entries in the second partitioned database table ;
a third primary key comprising the first primary key , the second primary key and one or more columns of the third partitioned database table used for partitioning the third partitioned database table ;
and an index of the third partitioned database table based on the one or more columns of the third portion (second data, second data group, second data set) ed database table used for partitioning the third database table .




US8190610B2

Filed: 2006-10-05     Issued: 2012-05-29

MapReduce for distributed database processing

(Original Assignee) Yahoo Inc     (Current Assignee) R2 Solutions LLC ; Altaba Inc

Ali Dasdan, Hung-Chih Yang, Ruey-Lung Hsiao
US20020049759A1

Filed: 2001-04-26     Issued: 2002-04-25

High performance relational database management system

(Original Assignee) Loren Christensen     (Current Assignee) LINMOR Inc

Loren Christensen
US8190610B2
CLAIM 1
. A method of processing data of a data set over a distributed system , wherein the data set comprises a plurality of data groups , the method comprising : partitioning the data of each one of the data groups into a plurality of data partitions that each have a plurality of key-value pairs and providing each data partition to a selected one of a plurality of mapping functions that are each user-configurable to independently output a plurality of lists of values for each of a set of keys found in such map function' ;

s corresponding data partition to form corresponding intermediate data for that data group and identifiable to that data group , wherein the data of a first data group has a different schema (data object) than the data of a second data group and the data of the first data group is mapped differently than the data of the second data group so that different lists of values are output for the corresponding different intermediate data , wherein the different schema and corresponding different intermediate data have a key in common ;

and reducing the intermediate data for the data groups to at least one output data group , including processing the intermediate data for each data group in a manner that is defined to correspond to that data group , so as to result in a merging of the corresponding different intermediate data based on the key in common , wherein the mapping and reducing operations are performed by a distributed system .
US20020049759A1
CLAIM 1
. A high performance relational database management system , leveraging the functionality of a high speed communications network , comprising the steps of : (i) receiving collected data object (different schema) s from at least one data collection node using at least one performance monitoring computer whereby a distributed database is created ;
(ii) partitioning the distributed database into data hunks using a histogram routine running on at least one performance monitoring server computer ;
(iii) importing the data hunks into a plurality of delegated database engine instances located on at least one performance monitoring server computer so as to parallel process the data hunks whereby processed data is generated ;
and (iv) accessing the processed data using at least one performance client computer to monitor data object performance .

US8190610B2
CLAIM 12
. The method of claim 5 , wherein : the intermediate data includes a plurality of grouped sets of key/value pairs ;

the reducing step is carried out by a plurality of reducers ;

for at least one of the reducers , the iterator corresponding to a particular data group , for that reducer , operates according to a different key of a different schema (data object) than the iterator corresponding to another particular data group , for that reducer .
US20020049759A1
CLAIM 1
. A high performance relational database management system , leveraging the functionality of a high speed communications network , comprising the steps of : (i) receiving collected data object (different schema) s from at least one data collection node using at least one performance monitoring computer whereby a distributed database is created ;
(ii) partitioning the distributed database into data hunks using a histogram routine running on at least one performance monitoring server computer ;
(iii) importing the data hunks into a plurality of delegated database engine instances located on at least one performance monitoring server computer so as to parallel process the data hunks whereby processed data is generated ;
and (iv) accessing the processed data using at least one performance client computer to monitor data object performance .

US8190610B2
CLAIM 17
. A computer system including a plurality of computing devices , the computer system configured to process data of a data set , wherein the data set comprises a plurality of data groups , the computer system comprises at least one processor and memory that are operable to perform the following operations : partitioning the data of each one of the data groups into a plurality of data partitions that each have a plurality of key-value pairs and providing each data partition to a selected one of a plurality of mapping functions that are each user-configurable to independently output a plurality of lists of values for each of a set of keys found in such map function' ;

s corresponding data partition to form corresponding intermediate data for that data group and identifiable to that data group , wherein the data of a first data group has a different schema (data object) than the data of a second data group and the data of the first data group is mapped differently than the data of the second data group so that different lists of values are output for the corresponding different intermediate data , wherein the different schema and corresponding different intermediate data have a key in common ;

and reduce the intermediate data for the data groups to at least one output data group , including processing the intermediate data for each data group in a manner that is defined to correspond to that data group so as to result in a merging of the corresponding different intermediate data based on the key in common .
US20020049759A1
CLAIM 1
. A high performance relational database management system , leveraging the functionality of a high speed communications network , comprising the steps of : (i) receiving collected data object (different schema) s from at least one data collection node using at least one performance monitoring computer whereby a distributed database is created ;
(ii) partitioning the distributed database into data hunks using a histogram routine running on at least one performance monitoring server computer ;
(iii) importing the data hunks into a plurality of delegated database engine instances located on at least one performance monitoring server computer so as to parallel process the data hunks whereby processed data is generated ;
and (iv) accessing the processed data using at least one performance client computer to monitor data object performance .

US8190610B2
CLAIM 22
. The computer system of claim 21 , wherein : the intermediate data includes a plurality of grouped sets of key/value pairs ;

the at least one processor and memory are further operable to reduce the intermediate data via a plurality of reducers ;

for at least one of the reducers , the iterator corresponding to a particular data group , for that reducer , is configured to operate according to a different key of a different schema (data object) than the iterator corresponding to another particular data group , for that reducer .
US20020049759A1
CLAIM 1
. A high performance relational database management system , leveraging the functionality of a high speed communications network , comprising the steps of : (i) receiving collected data object (different schema) s from at least one data collection node using at least one performance monitoring computer whereby a distributed database is created ;
(ii) partitioning the distributed database into data hunks using a histogram routine running on at least one performance monitoring server computer ;
(iii) importing the data hunks into a plurality of delegated database engine instances located on at least one performance monitoring server computer so as to parallel process the data hunks whereby processed data is generated ;
and (iv) accessing the processed data using at least one performance client computer to monitor data object performance .

US8190610B2
CLAIM 33
. A map-reduce method of processing data from a plurality of groups having different schema (data object) over a computer system , the method comprising : for a first data set having a plurality of first key-value pairs , wherein such first data set belongs to a first data group and the first key-value pairs have a first schema , partitioning the first data set into a plurality of data partitions and providing each data partition to a selected one of a plurality of mapping functions that are each user-configurable to independently output a plurality of lists of values for each of a set of keys found in such map function' ;

s corresponding data partition to form a first intermediate data set having a first set of resulting key-value pairs ;

for a second data set having a plurality of second key-value pairs , wherein such second data set belongs to a second data group and the second key-value pairs have a second schema , partitioning the first data set into a plurality of data partitions and providing each data partition to a selected one of a plurality of mapping functions to form a second intermediate data set having a second set of resulting key-value pairs differing from the first set of resulting key-value pairs , wherein the first schema differs from the second schema , wherein the first and second schema and the first and second set of resulting key-value pairs have a key in common ;

and reducing the first and the second intermediate data set together so as to form an output data set , wherein the reducing is accomplished by iterating on at least one key from the first and second intermediate data set , and the output data set has a different schema than the first and second schema , wherein the mapping and reducing operations are performed by a distributed system .
US20020049759A1
CLAIM 1
. A high performance relational database management system , leveraging the functionality of a high speed communications network , comprising the steps of : (i) receiving collected data object (different schema) s from at least one data collection node using at least one performance monitoring computer whereby a distributed database is created ;
(ii) partitioning the distributed database into data hunks using a histogram routine running on at least one performance monitoring server computer ;
(iii) importing the data hunks into a plurality of delegated database engine instances located on at least one performance monitoring server computer so as to parallel process the data hunks whereby processed data is generated ;
and (iv) accessing the processed data using at least one performance client computer to monitor data object performance .

US8190610B2
CLAIM 40
. A computer system including a plurality of computing devices , the computer system configured to process data of a data set , wherein the data set comprises a plurality of data groups , the computer system comprises at least one processor and memory that are operable to perform the following operations : for a first data set having a plurality of first key-value pairs , wherein such first data set belongs to a first data group and the first key-value pairs have a first schema , partitioning the first data set into a plurality of data partitions and providing each data partition to a selected one of a plurality of mapping functions that are each user-configurable to independently output a plurality of lists of values for each of a set of keys found in such map function' ;

s corresponding data partition to form a first intermediate data set having a first set of resulting key-value pairs ;

for a second data set having a plurality of second key-value pairs , wherein such second data set belongs to a second data group and the second key-value pairs have a second schema , partitioning the first data set into a plurality of data partitions and providing each data partition to a selected one of a plurality of mapping functions to form a second intermediate data set having a second set of resulting key-value pairs differing from the first set of resulting key-value pairs , wherein the first schema differs from the second schema ;

and reducing the first and the second intermediate data set together so as to form an output data set , wherein the reducing is accomplished by iterating on at least one key from the first and second intermediate data set , and the output data set has a different schema (data object) than the first and second schema , wherein the first and second schema and the first and second set of resulting key-value pairs have a key in common .
US20020049759A1
CLAIM 1
. A high performance relational database management system , leveraging the functionality of a high speed communications network , comprising the steps of : (i) receiving collected data object (different schema) s from at least one data collection node using at least one performance monitoring computer whereby a distributed database is created ;
(ii) partitioning the distributed database into data hunks using a histogram routine running on at least one performance monitoring server computer ;
(iii) importing the data hunks into a plurality of delegated database engine instances located on at least one performance monitoring server computer so as to parallel process the data hunks whereby processed data is generated ;
and (iv) accessing the processed data using at least one performance client computer to monitor data object performance .




US8190610B2

Filed: 2006-10-05     Issued: 2012-05-29

MapReduce for distributed database processing

(Original Assignee) Yahoo Inc     (Current Assignee) R2 Solutions LLC ; Altaba Inc

Ali Dasdan, Hung-Chih Yang, Ruey-Lung Hsiao
US20020116404A1

Filed: 2001-01-25     Issued: 2002-08-22

Method and system for highly-parallel logging and recovery operation in main-memory transaction processing systems

(Original Assignee) Transact In Memory Inc     (Current Assignee) SAP SE ; Transact In Memory Inc

Sang Cha, Ju Lee, Ki Kim
US8190610B2
CLAIM 14
. The method of claim 13 , wherein : the reducing step is carried out by a plurality of reducers ;

and the data that is not intermediate data , for a particular reducer (more log) , includes data that is associated with another reducer .
US20020116404A1
CLAIM 1
. A method of logging updates in a main-memory transaction-processing system having main memory for storing a database , one or more log (particular reducer) disks for storing log records for parallel recovery of the main memory database , and one or more backup disks for storing a copy of the main memory database , the method comprising the steps of : taking a before image of the database before an update to the database is made ;
taking an after image of the database after the update is made ;
generating a differential log as a log body of each log record by applying a bit-wise exclusive-OR (XOR) operation between the before image and the after image ;
recovering from a failure by applying the XOR operation between the differential log and the before-image .

US8190610B2
CLAIM 15
. The method of claim 13 , wherein : the reducing step is carried out by a plurality of reducers ;

and the data that is not intermediate data , for a particular reducer (more log) , includes data that is associated with that reducer .
US20020116404A1
CLAIM 1
. A method of logging updates in a main-memory transaction-processing system having main memory for storing a database , one or more log (particular reducer) disks for storing log records for parallel recovery of the main memory database , and one or more backup disks for storing a copy of the main memory database , the method comprising the steps of : taking a before image of the database before an update to the database is made ;
taking an after image of the database after the update is made ;
generating a differential log as a log body of each log record by applying a bit-wise exclusive-OR (XOR) operation between the before image and the after image ;
recovering from a failure by applying the XOR operation between the differential log and the before-image .

US8190610B2
CLAIM 30
. The computer system of claim 29 , wherein : the at least one processor and memory are further operable to reduce the intermediate data via a plurality of reducers ;

and the data that is not intermediate data , for a particular reducer (more log) , includes data that is associated with another reducer .
US20020116404A1
CLAIM 1
. A method of logging updates in a main-memory transaction-processing system having main memory for storing a database , one or more log (particular reducer) disks for storing log records for parallel recovery of the main memory database , and one or more backup disks for storing a copy of the main memory database , the method comprising the steps of : taking a before image of the database before an update to the database is made ;
taking an after image of the database after the update is made ;
generating a differential log as a log body of each log record by applying a bit-wise exclusive-OR (XOR) operation between the before image and the after image ;
recovering from a failure by applying the XOR operation between the differential log and the before-image .

US8190610B2
CLAIM 31
. The computer system of claim 29 , wherein : the at least one processor and memory are further operable to reduce the intermediate data via a plurality of reducers ;

and the data that is not intermediate data , for a particular reducer (more log) , includes data that is associated with that reducer .
US20020116404A1
CLAIM 1
. A method of logging updates in a main-memory transaction-processing system having main memory for storing a database , one or more log (particular reducer) disks for storing log records for parallel recovery of the main memory database , and one or more backup disks for storing a copy of the main memory database , the method comprising the steps of : taking a before image of the database before an update to the database is made ;
taking an after image of the database after the update is made ;
generating a differential log as a log body of each log record by applying a bit-wise exclusive-OR (XOR) operation between the before image and the after image ;
recovering from a failure by applying the XOR operation between the differential log and the before-image .

US8190610B2
CLAIM 33
. A map-reduce method of processing data from a plurality of groups having different schema over a computer system , the method comprising : for a first data set having a plurality of first key-value pairs , wherein such first data set belongs to a first data group and the first key-value pairs have a first schema , partitioning the first data set into a plurality of data partitions and providing each data partition to a selected one of a plurality of mapping functions that are each user-configurable to independently output a plurality of lists of values for each of a set of keys found in such map function' ;

s corresponding data partition to form a first intermediate data set having a first set of resulting key-value pairs ;

for a second data set (consistent state) having a plurality of second key-value pairs , wherein such second data set belongs to a second data group and the second key-value pairs have a second schema , partitioning the first data set into a plurality of data partitions and providing each data partition to a selected one of a plurality of mapping functions to form a second intermediate data set having a second set of resulting key-value pairs differing from the first set of resulting key-value pairs , wherein the first schema differs from the second schema , wherein the first and second schema and the first and second set of resulting key-value pairs have a key in common ;

and reducing the first and the second intermediate data set together so as to form an output data set , wherein the reducing is accomplished by iterating on at least one key from the first and second intermediate data set , and the output data set has a different schema than the first and second schema , wherein the mapping and reducing operations are performed by a distributed system .
US20020116404A1
CLAIM 8
. The method of claim 4 , wherein the step of recovering comprises the steps of : loading the backup data from said one or more backup disks into the main memory database ;
and loading the log from said one or more log disks into the main memory database in order to restore the main memory database to the most recent consistent state (second data set) .

US8190610B2
CLAIM 40
. A computer system including a plurality of computing devices , the computer system configured to process data of a data set , wherein the data set comprises a plurality of data groups , the computer system comprises at least one processor and memory that are operable to perform the following operations : for a first data set having a plurality of first key-value pairs , wherein such first data set belongs to a first data group and the first key-value pairs have a first schema , partitioning the first data set into a plurality of data partitions and providing each data partition to a selected one of a plurality of mapping functions that are each user-configurable to independently output a plurality of lists of values for each of a set of keys found in such map function' ;

s corresponding data partition to form a first intermediate data set having a first set of resulting key-value pairs ;

for a second data set (consistent state) having a plurality of second key-value pairs , wherein such second data set belongs to a second data group and the second key-value pairs have a second schema , partitioning the first data set into a plurality of data partitions and providing each data partition to a selected one of a plurality of mapping functions to form a second intermediate data set having a second set of resulting key-value pairs differing from the first set of resulting key-value pairs , wherein the first schema differs from the second schema ;

and reducing the first and the second intermediate data set together so as to form an output data set , wherein the reducing is accomplished by iterating on at least one key from the first and second intermediate data set , and the output data set has a different schema than the first and second schema , wherein the first and second schema and the first and second set of resulting key-value pairs have a key in common .
US20020116404A1
CLAIM 8
. The method of claim 4 , wherein the step of recovering comprises the steps of : loading the backup data from said one or more backup disks into the main memory database ;
and loading the log from said one or more log disks into the main memory database in order to restore the main memory database to the most recent consistent state (second data set) .




US8190610B2

Filed: 2006-10-05     Issued: 2012-05-29

MapReduce for distributed database processing

(Original Assignee) Yahoo Inc     (Current Assignee) R2 Solutions LLC ; Altaba Inc

Ali Dasdan, Hung-Chih Yang, Ruey-Lung Hsiao
US6678691B1

Filed: 2000-07-03     Issued: 2004-01-13

Method and system for generating corporate information

(Original Assignee) Koninklijke KPN NV     (Current Assignee) ATOS ORIGIN NEDERLAND BV ; Koninklijke KPN NV

Harald Kikkers
US8190610B2
CLAIM 1
. A method of processing data of a data set (said system) over a distributed system , wherein the data set comprises a plurality of data groups (system users) , the method comprising : partitioning the data of each one of the data groups into a plurality of data partitions that each have a plurality of key-value pairs and providing each data partition to a selected one of a plurality of mapping functions that are each user-configurable to independently output a plurality of lists of values for each of a set of keys found in such map function' ;

s corresponding data partition to form corresponding intermediate data (intermediate data) for that data group and identifiable to that data group , wherein the data of a first data group has a different schema (data model) than the data of a second data group and the data of the first data group is mapped differently than the data of the second data group so that different lists of values are output for the corresponding different intermediate data , wherein the different schema and corresponding different intermediate data have a key in common ;

and reducing the intermediate data for the data groups to at least one output data group , including processing the intermediate data for each data group in a manner that is defined to correspond to that data group , so as to result in a merging of the corresponding different intermediate data based on the key in common , wherein the mapping and reducing operations are performed by a distributed system .
US6678691B1
CLAIM 1
. A system for generating corporate information such as customer information and the like and providing said corporate information to system users (data groups) , said corporate information originating from various automation systems which contain information on partly the same objects and being displayed as originating from one source thus providing an integrated image , said system (data set, first data set, second data set) comprising means for transforming the contents of a plurality of source data bases , each of them being loaded with legacy data records which are modelled according to a data model (different schema) specific for that source database , to a target database having its own specific target data model , based upon the corporate data model , and said system comprising an intermediate merge database , including control means , between said source databases and said target database , said merge database having an intermediate data (intermediate data, computer system) structure , and being filled with data from said source database , while said data are re-grouped and re-structured according to said intermediate data structure ;
said target database being filled from said merge database with said re-grouped and re-structured data , according to said target data model , said target model being enabled to extract data by said system users via user front-end applications ;
said intermediate data structure of said merge database comprising following structure elements ;
a source identification (BI) indicating the origin of data records within the source database ;
a source reference (BR) indicating the origin of related data items in the source database ;
a target object identifier (TI) indicating a target object for each of said data record ;
a target object reference (TR) indicating a target object for each of said related data items ;
a status identifier (ST) indicating if data is current ;
and grouping attributes (GA) indicating fields into which data records can be divided in the case of a target object , and relational attributes (RA) indicating fields with which data records are related to a target object-identifier within the merge database ;
said transformation , by means of said merge database , comprising following basic operations : a PROJECTION operation , consisting of filling the table of the merge database from the source database and thereby changing said source identification (BI) , said source reference (BR) , said status identifier (ST) , and said attributes (GA , RA) , by said merge database , in conformity to the different data models of the respective databases ;
a GROUPING operation , consisting of determining the target object identifier of each data record and thereby changing said target identifier (TI) and reading said status identifier (ST) and said grouping attribute ;
a RELATING operation , consisting of determining the target object identifier of each related data item and thereby reading said target object identifier (TI) , said status identifier and said relational attribute (RA) , while changing sand target object reference (TR) ;
and optionally a CALCULATING operation , consisting of an evaluation of each attribute of a table of the merge database .

US8190610B2
CLAIM 2
. The method of claim 1 , wherein : the at least one output data group is a plurality of output data groups (system users) .
US6678691B1
CLAIM 1
. A system for generating corporate information such as customer information and the like and providing said corporate information to system users (data groups) , said corporate information originating from various automation systems which contain information on partly the same objects and being displayed as originating from one source thus providing an integrated image , said system comprising means for transforming the contents of a plurality of source data bases , each of them being loaded with legacy data records which are modelled according to a data model specific for that source database , to a target database having its own specific target data model , based upon the corporate data model , and said system comprising an intermediate merge database , including control means , between said source databases and said target database , said merge database having an intermediate data structure , and being filled with data from said source database , while said data are re-grouped and re-structured according to said intermediate data structure ;
said target database being filled from said merge database with said re-grouped and re-structured data , according to said target data model , said target model being enabled to extract data by said system users via user front-end applications ;
said intermediate data structure of said merge database comprising following structure elements ;
a source identification (BI) indicating the origin of data records within the source database ;
a source reference (BR) indicating the origin of related data items in the source database ;
a target object identifier (TI) indicating a target object for each of said data record ;
a target object reference (TR) indicating a target object for each of said related data items ;
a status identifier (ST) indicating if data is current ;
and grouping attributes (GA) indicating fields into which data records can be divided in the case of a target object , and relational attributes (RA) indicating fields with which data records are related to a target object-identifier within the merge database ;
said transformation , by means of said merge database , comprising following basic operations : a PROJECTION operation , consisting of filling the table of the merge database from the source database and thereby changing said source identification (BI) , said source reference (BR) , said status identifier (ST) , and said attributes (GA , RA) , by said merge database , in conformity to the different data models of the respective databases ;
a GROUPING operation , consisting of determining the target object identifier of each data record and thereby changing said target identifier (TI) and reading said status identifier (ST) and said grouping attribute ;
a RELATING operation , consisting of determining the target object identifier of each related data item and thereby reading said target object identifier (TI) , said status identifier and said relational attribute (RA) , while changing sand target object reference (TR) ;
and optionally a CALCULATING operation , consisting of an evaluation of each attribute of a table of the merge database .

US8190610B2
CLAIM 3
. The method of claim 1 , wherein : corresponding intermediate data (intermediate data) for a data group being identifiable to the data group includes the corresponding intermediate data being stored such that metadata associated with a manner of storage is identifiable to the data group .
US6678691B1
CLAIM 1
. A system for generating corporate information such as customer information and the like and providing said corporate information to system users , said corporate information originating from various automation systems which contain information on partly the same objects and being displayed as originating from one source thus providing an integrated image , said system comprising means for transforming the contents of a plurality of source data bases , each of them being loaded with legacy data records which are modelled according to a data model specific for that source database , to a target database having its own specific target data model , based upon the corporate data model , and said system comprising an intermediate merge database , including control means , between said source databases and said target database , said merge database having an intermediate data (intermediate data, computer system) structure , and being filled with data from said source database , while said data are re-grouped and re-structured according to said intermediate data structure ;
said target database being filled from said merge database with said re-grouped and re-structured data , according to said target data model , said target model being enabled to extract data by said system users via user front-end applications ;
said intermediate data structure of said merge database comprising following structure elements ;
a source identification (BI) indicating the origin of data records within the source database ;
a source reference (BR) indicating the origin of related data items in the source database ;
a target object identifier (TI) indicating a target object for each of said data record ;
a target object reference (TR) indicating a target object for each of said related data items ;
a status identifier (ST) indicating if data is current ;
and grouping attributes (GA) indicating fields into which data records can be divided in the case of a target object , and relational attributes (RA) indicating fields with which data records are related to a target object-identifier within the merge database ;
said transformation , by means of said merge database , comprising following basic operations : a PROJECTION operation , consisting of filling the table of the merge database from the source database and thereby changing said source identification (BI) , said source reference (BR) , said status identifier (ST) , and said attributes (GA , RA) , by said merge database , in conformity to the different data models of the respective databases ;
a GROUPING operation , consisting of determining the target object identifier of each data record and thereby changing said target identifier (TI) and reading said status identifier (ST) and said grouping attribute ;
a RELATING operation , consisting of determining the target object identifier of each related data item and thereby reading said target object identifier (TI) , said status identifier and said relational attribute (RA) , while changing sand target object reference (TR) ;
and optionally a CALCULATING operation , consisting of an evaluation of each attribute of a table of the merge database .

US8190610B2
CLAIM 4
. The method of claim 1 , wherein : corresponding intermediate data (intermediate data) for a data group being identifiable to the data group includes the corresponding intermediate data being stored in association with data that is identifiable to the data group .
US6678691B1
CLAIM 1
. A system for generating corporate information such as customer information and the like and providing said corporate information to system users , said corporate information originating from various automation systems which contain information on partly the same objects and being displayed as originating from one source thus providing an integrated image , said system comprising means for transforming the contents of a plurality of source data bases , each of them being loaded with legacy data records which are modelled according to a data model specific for that source database , to a target database having its own specific target data model , based upon the corporate data model , and said system comprising an intermediate merge database , including control means , between said source databases and said target database , said merge database having an intermediate data (intermediate data, computer system) structure , and being filled with data from said source database , while said data are re-grouped and re-structured according to said intermediate data structure ;
said target database being filled from said merge database with said re-grouped and re-structured data , according to said target data model , said target model being enabled to extract data by said system users via user front-end applications ;
said intermediate data structure of said merge database comprising following structure elements ;
a source identification (BI) indicating the origin of data records within the source database ;
a source reference (BR) indicating the origin of related data items in the source database ;
a target object identifier (TI) indicating a target object for each of said data record ;
a target object reference (TR) indicating a target object for each of said related data items ;
a status identifier (ST) indicating if data is current ;
and grouping attributes (GA) indicating fields into which data records can be divided in the case of a target object , and relational attributes (RA) indicating fields with which data records are related to a target object-identifier within the merge database ;
said transformation , by means of said merge database , comprising following basic operations : a PROJECTION operation , consisting of filling the table of the merge database from the source database and thereby changing said source identification (BI) , said source reference (BR) , said status identifier (ST) , and said attributes (GA , RA) , by said merge database , in conformity to the different data models of the respective databases ;
a GROUPING operation , consisting of determining the target object identifier of each data record and thereby changing said target identifier (TI) and reading said status identifier (ST) and said grouping attribute ;
a RELATING operation , consisting of determining the target object identifier of each related data item and thereby reading said target object identifier (TI) , said status identifier and said relational attribute (RA) , while changing sand target object reference (TR) ;
and optionally a CALCULATING operation , consisting of an evaluation of each attribute of a table of the merge database .

US8190610B2
CLAIM 5
. The method of claim 1 , wherein : processing the intermediate data (intermediate data) for each data group in a manner that is defined to correspond to that data group includes , for each data group , employing an iterator that corresponds to that data group .
US6678691B1
CLAIM 1
. A system for generating corporate information such as customer information and the like and providing said corporate information to system users , said corporate information originating from various automation systems which contain information on partly the same objects and being displayed as originating from one source thus providing an integrated image , said system comprising means for transforming the contents of a plurality of source data bases , each of them being loaded with legacy data records which are modelled according to a data model specific for that source database , to a target database having its own specific target data model , based upon the corporate data model , and said system comprising an intermediate merge database , including control means , between said source databases and said target database , said merge database having an intermediate data (intermediate data, computer system) structure , and being filled with data from said source database , while said data are re-grouped and re-structured according to said intermediate data structure ;
said target database being filled from said merge database with said re-grouped and re-structured data , according to said target data model , said target model being enabled to extract data by said system users via user front-end applications ;
said intermediate data structure of said merge database comprising following structure elements ;
a source identification (BI) indicating the origin of data records within the source database ;
a source reference (BR) indicating the origin of related data items in the source database ;
a target object identifier (TI) indicating a target object for each of said data record ;
a target object reference (TR) indicating a target object for each of said related data items ;
a status identifier (ST) indicating if data is current ;
and grouping attributes (GA) indicating fields into which data records can be divided in the case of a target object , and relational attributes (RA) indicating fields with which data records are related to a target object-identifier within the merge database ;
said transformation , by means of said merge database , comprising following basic operations : a PROJECTION operation , consisting of filling the table of the merge database from the source database and thereby changing said source identification (BI) , said source reference (BR) , said status identifier (ST) , and said attributes (GA , RA) , by said merge database , in conformity to the different data models of the respective databases ;
a GROUPING operation , consisting of determining the target object identifier of each data record and thereby changing said target identifier (TI) and reading said status identifier (ST) and said grouping attribute ;
a RELATING operation , consisting of determining the target object identifier of each related data item and thereby reading said target object identifier (TI) , said status identifier and said relational attribute (RA) , while changing sand target object reference (TR) ;
and optionally a CALCULATING operation , consisting of an evaluation of each attribute of a table of the merge database .

US8190610B2
CLAIM 6
. The method of claim 1 , wherein : the intermediate data (intermediate data) includes a plurality of grouped sets of key/value pairs ;

the reducing step is carried out by a plurality of reducers ;

the method further comprises : partitioning the intermediate data into a plurality of partitions , each key/value pair of the intermediate data being provided to a separate one of the partitions ;

and providing the intermediate data of each partition to a separate one of the reducers .
US6678691B1
CLAIM 1
. A system for generating corporate information such as customer information and the like and providing said corporate information to system users , said corporate information originating from various automation systems which contain information on partly the same objects and being displayed as originating from one source thus providing an integrated image , said system comprising means for transforming the contents of a plurality of source data bases , each of them being loaded with legacy data records which are modelled according to a data model specific for that source database , to a target database having its own specific target data model , based upon the corporate data model , and said system comprising an intermediate merge database , including control means , between said source databases and said target database , said merge database having an intermediate data (intermediate data, computer system) structure , and being filled with data from said source database , while said data are re-grouped and re-structured according to said intermediate data structure ;
said target database being filled from said merge database with said re-grouped and re-structured data , according to said target data model , said target model being enabled to extract data by said system users via user front-end applications ;
said intermediate data structure of said merge database comprising following structure elements ;
a source identification (BI) indicating the origin of data records within the source database ;
a source reference (BR) indicating the origin of related data items in the source database ;
a target object identifier (TI) indicating a target object for each of said data record ;
a target object reference (TR) indicating a target object for each of said related data items ;
a status identifier (ST) indicating if data is current ;
and grouping attributes (GA) indicating fields into which data records can be divided in the case of a target object , and relational attributes (RA) indicating fields with which data records are related to a target object-identifier within the merge database ;
said transformation , by means of said merge database , comprising following basic operations : a PROJECTION operation , consisting of filling the table of the merge database from the source database and thereby changing said source identification (BI) , said source reference (BR) , said status identifier (ST) , and said attributes (GA , RA) , by said merge database , in conformity to the different data models of the respective databases ;
a GROUPING operation , consisting of determining the target object identifier of each data record and thereby changing said target identifier (TI) and reading said status identifier (ST) and said grouping attribute ;
a RELATING operation , consisting of determining the target object identifier of each related data item and thereby reading said target object identifier (TI) , said status identifier and said relational attribute (RA) , while changing sand target object reference (TR) ;
and optionally a CALCULATING operation , consisting of an evaluation of each attribute of a table of the merge database .

US8190610B2
CLAIM 7
. The method of claim 1 , wherein : the intermediate data (intermediate data) includes a plurality of grouped sets of key/value pairs ;

the reducing step is carried out by a plurality of reducers ;

the method further comprises : partitioning the intermediate data into a plurality of partitions , at least some of the key/value pairs of the intermediate data being provided to more than one of the partitions ;

and providing the intermediate data of each partition to a separate one of the reducers .
US6678691B1
CLAIM 1
. A system for generating corporate information such as customer information and the like and providing said corporate information to system users , said corporate information originating from various automation systems which contain information on partly the same objects and being displayed as originating from one source thus providing an integrated image , said system comprising means for transforming the contents of a plurality of source data bases , each of them being loaded with legacy data records which are modelled according to a data model specific for that source database , to a target database having its own specific target data model , based upon the corporate data model , and said system comprising an intermediate merge database , including control means , between said source databases and said target database , said merge database having an intermediate data (intermediate data, computer system) structure , and being filled with data from said source database , while said data are re-grouped and re-structured according to said intermediate data structure ;
said target database being filled from said merge database with said re-grouped and re-structured data , according to said target data model , said target model being enabled to extract data by said system users via user front-end applications ;
said intermediate data structure of said merge database comprising following structure elements ;
a source identification (BI) indicating the origin of data records within the source database ;
a source reference (BR) indicating the origin of related data items in the source database ;
a target object identifier (TI) indicating a target object for each of said data record ;
a target object reference (TR) indicating a target object for each of said related data items ;
a status identifier (ST) indicating if data is current ;
and grouping attributes (GA) indicating fields into which data records can be divided in the case of a target object , and relational attributes (RA) indicating fields with which data records are related to a target object-identifier within the merge database ;
said transformation , by means of said merge database , comprising following basic operations : a PROJECTION operation , consisting of filling the table of the merge database from the source database and thereby changing said source identification (BI) , said source reference (BR) , said status identifier (ST) , and said attributes (GA , RA) , by said merge database , in conformity to the different data models of the respective databases ;
a GROUPING operation , consisting of determining the target object identifier of each data record and thereby changing said target identifier (TI) and reading said status identifier (ST) and said grouping attribute ;
a RELATING operation , consisting of determining the target object identifier of each related data item and thereby reading said target object identifier (TI) , said status identifier and said relational attribute (RA) , while changing sand target object reference (TR) ;
and optionally a CALCULATING operation , consisting of an evaluation of each attribute of a table of the merge database .

US8190610B2
CLAIM 8
. The method of claim 7 , wherein : in the partitioning step , all of the key/value pairs of the intermediate data (intermediate data) are provided to all of the partitions .
US6678691B1
CLAIM 1
. A system for generating corporate information such as customer information and the like and providing said corporate information to system users , said corporate information originating from various automation systems which contain information on partly the same objects and being displayed as originating from one source thus providing an integrated image , said system comprising means for transforming the contents of a plurality of source data bases , each of them being loaded with legacy data records which are modelled according to a data model specific for that source database , to a target database having its own specific target data model , based upon the corporate data model , and said system comprising an intermediate merge database , including control means , between said source databases and said target database , said merge database having an intermediate data (intermediate data, computer system) structure , and being filled with data from said source database , while said data are re-grouped and re-structured according to said intermediate data structure ;
said target database being filled from said merge database with said re-grouped and re-structured data , according to said target data model , said target model being enabled to extract data by said system users via user front-end applications ;
said intermediate data structure of said merge database comprising following structure elements ;
a source identification (BI) indicating the origin of data records within the source database ;
a source reference (BR) indicating the origin of related data items in the source database ;
a target object identifier (TI) indicating a target object for each of said data record ;
a target object reference (TR) indicating a target object for each of said related data items ;
a status identifier (ST) indicating if data is current ;
and grouping attributes (GA) indicating fields into which data records can be divided in the case of a target object , and relational attributes (RA) indicating fields with which data records are related to a target object-identifier within the merge database ;
said transformation , by means of said merge database , comprising following basic operations : a PROJECTION operation , consisting of filling the table of the merge database from the source database and thereby changing said source identification (BI) , said source reference (BR) , said status identifier (ST) , and said attributes (GA , RA) , by said merge database , in conformity to the different data models of the respective databases ;
a GROUPING operation , consisting of determining the target object identifier of each data record and thereby changing said target identifier (TI) and reading said status identifier (ST) and said grouping attribute ;
a RELATING operation , consisting of determining the target object identifier of each related data item and thereby reading said target object identifier (TI) , said status identifier and said relational attribute (RA) , while changing sand target object reference (TR) ;
and optionally a CALCULATING operation , consisting of an evaluation of each attribute of a table of the merge database .

US8190610B2
CLAIM 11
. The method of claim 10 , wherein : processing the intermediate data (intermediate data) for each data group in a manner that is defined to correspond to that data group includes , for each data group , employing an iterator that corresponds to that data group , wherein the iterator includes providing the associated metadata to the processing of the reducing step .
US6678691B1
CLAIM 1
. A system for generating corporate information such as customer information and the like and providing said corporate information to system users , said corporate information originating from various automation systems which contain information on partly the same objects and being displayed as originating from one source thus providing an integrated image , said system comprising means for transforming the contents of a plurality of source data bases , each of them being loaded with legacy data records which are modelled according to a data model specific for that source database , to a target database having its own specific target data model , based upon the corporate data model , and said system comprising an intermediate merge database , including control means , between said source databases and said target database , said merge database having an intermediate data (intermediate data, computer system) structure , and being filled with data from said source database , while said data are re-grouped and re-structured according to said intermediate data structure ;
said target database being filled from said merge database with said re-grouped and re-structured data , according to said target data model , said target model being enabled to extract data by said system users via user front-end applications ;
said intermediate data structure of said merge database comprising following structure elements ;
a source identification (BI) indicating the origin of data records within the source database ;
a source reference (BR) indicating the origin of related data items in the source database ;
a target object identifier (TI) indicating a target object for each of said data record ;
a target object reference (TR) indicating a target object for each of said related data items ;
a status identifier (ST) indicating if data is current ;
and grouping attributes (GA) indicating fields into which data records can be divided in the case of a target object , and relational attributes (RA) indicating fields with which data records are related to a target object-identifier within the merge database ;
said transformation , by means of said merge database , comprising following basic operations : a PROJECTION operation , consisting of filling the table of the merge database from the source database and thereby changing said source identification (BI) , said source reference (BR) , said status identifier (ST) , and said attributes (GA , RA) , by said merge database , in conformity to the different data models of the respective databases ;
a GROUPING operation , consisting of determining the target object identifier of each data record and thereby changing said target identifier (TI) and reading said status identifier (ST) and said grouping attribute ;
a RELATING operation , consisting of determining the target object identifier of each related data item and thereby reading said target object identifier (TI) , said status identifier and said relational attribute (RA) , while changing sand target object reference (TR) ;
and optionally a CALCULATING operation , consisting of an evaluation of each attribute of a table of the merge database .

US8190610B2
CLAIM 12
. The method of claim 5 , wherein : the intermediate data (intermediate data) includes a plurality of grouped sets of key/value pairs ;

the reducing step is carried out by a plurality of reducers ;

for at least one of the reducers , the iterator corresponding to a particular data (one source) group , for that reducer , operates according to a different key of a different schema (data model) than the iterator corresponding to another particular data group , for that reducer .
US6678691B1
CLAIM 1
. A system for generating corporate information such as customer information and the like and providing said corporate information to system users , said corporate information originating from various automation systems which contain information on partly the same objects and being displayed as originating from one source (particular data, particular data group) thus providing an integrated image , said system comprising means for transforming the contents of a plurality of source data bases , each of them being loaded with legacy data records which are modelled according to a data model (different schema) specific for that source database , to a target database having its own specific target data model , based upon the corporate data model , and said system comprising an intermediate merge database , including control means , between said source databases and said target database , said merge database having an intermediate data (intermediate data, computer system) structure , and being filled with data from said source database , while said data are re-grouped and re-structured according to said intermediate data structure ;
said target database being filled from said merge database with said re-grouped and re-structured data , according to said target data model , said target model being enabled to extract data by said system users via user front-end applications ;
said intermediate data structure of said merge database comprising following structure elements ;
a source identification (BI) indicating the origin of data records within the source database ;
a source reference (BR) indicating the origin of related data items in the source database ;
a target object identifier (TI) indicating a target object for each of said data record ;
a target object reference (TR) indicating a target object for each of said related data items ;
a status identifier (ST) indicating if data is current ;
and grouping attributes (GA) indicating fields into which data records can be divided in the case of a target object , and relational attributes (RA) indicating fields with which data records are related to a target object-identifier within the merge database ;
said transformation , by means of said merge database , comprising following basic operations : a PROJECTION operation , consisting of filling the table of the merge database from the source database and thereby changing said source identification (BI) , said source reference (BR) , said status identifier (ST) , and said attributes (GA , RA) , by said merge database , in conformity to the different data models of the respective databases ;
a GROUPING operation , consisting of determining the target object identifier of each data record and thereby changing said target identifier (TI) and reading said status identifier (ST) and said grouping attribute ;
a RELATING operation , consisting of determining the target object identifier of each related data item and thereby reading said target object identifier (TI) , said status identifier and said relational attribute (RA) , while changing sand target object reference (TR) ;
and optionally a CALCULATING operation , consisting of an evaluation of each attribute of a table of the merge database .

US8190610B2
CLAIM 13
. The method of claim 1 , wherein : the intermediate data (intermediate data) processing step of the reducing step further comprises processing data that is not intermediate data .
US6678691B1
CLAIM 1
. A system for generating corporate information such as customer information and the like and providing said corporate information to system users , said corporate information originating from various automation systems which contain information on partly the same objects and being displayed as originating from one source thus providing an integrated image , said system comprising means for transforming the contents of a plurality of source data bases , each of them being loaded with legacy data records which are modelled according to a data model specific for that source database , to a target database having its own specific target data model , based upon the corporate data model , and said system comprising an intermediate merge database , including control means , between said source databases and said target database , said merge database having an intermediate data (intermediate data, computer system) structure , and being filled with data from said source database , while said data are re-grouped and re-structured according to said intermediate data structure ;
said target database being filled from said merge database with said re-grouped and re-structured data , according to said target data model , said target model being enabled to extract data by said system users via user front-end applications ;
said intermediate data structure of said merge database comprising following structure elements ;
a source identification (BI) indicating the origin of data records within the source database ;
a source reference (BR) indicating the origin of related data items in the source database ;
a target object identifier (TI) indicating a target object for each of said data record ;
a target object reference (TR) indicating a target object for each of said related data items ;
a status identifier (ST) indicating if data is current ;
and grouping attributes (GA) indicating fields into which data records can be divided in the case of a target object , and relational attributes (RA) indicating fields with which data records are related to a target object-identifier within the merge database ;
said transformation , by means of said merge database , comprising following basic operations : a PROJECTION operation , consisting of filling the table of the merge database from the source database and thereby changing said source identification (BI) , said source reference (BR) , said status identifier (ST) , and said attributes (GA , RA) , by said merge database , in conformity to the different data models of the respective databases ;
a GROUPING operation , consisting of determining the target object identifier of each data record and thereby changing said target identifier (TI) and reading said status identifier (ST) and said grouping attribute ;
a RELATING operation , consisting of determining the target object identifier of each related data item and thereby reading said target object identifier (TI) , said status identifier and said relational attribute (RA) , while changing sand target object reference (TR) ;
and optionally a CALCULATING operation , consisting of an evaluation of each attribute of a table of the merge database .

US8190610B2
CLAIM 14
. The method of claim 13 , wherein : the reducing step is carried out by a plurality of reducers ;

and the data that is not intermediate data (intermediate data) , for a particular reducer , includes data that is associated with another reducer .
US6678691B1
CLAIM 1
. A system for generating corporate information such as customer information and the like and providing said corporate information to system users , said corporate information originating from various automation systems which contain information on partly the same objects and being displayed as originating from one source thus providing an integrated image , said system comprising means for transforming the contents of a plurality of source data bases , each of them being loaded with legacy data records which are modelled according to a data model specific for that source database , to a target database having its own specific target data model , based upon the corporate data model , and said system comprising an intermediate merge database , including control means , between said source databases and said target database , said merge database having an intermediate data (intermediate data, computer system) structure , and being filled with data from said source database , while said data are re-grouped and re-structured according to said intermediate data structure ;
said target database being filled from said merge database with said re-grouped and re-structured data , according to said target data model , said target model being enabled to extract data by said system users via user front-end applications ;
said intermediate data structure of said merge database comprising following structure elements ;
a source identification (BI) indicating the origin of data records within the source database ;
a source reference (BR) indicating the origin of related data items in the source database ;
a target object identifier (TI) indicating a target object for each of said data record ;
a target object reference (TR) indicating a target object for each of said related data items ;
a status identifier (ST) indicating if data is current ;
and grouping attributes (GA) indicating fields into which data records can be divided in the case of a target object , and relational attributes (RA) indicating fields with which data records are related to a target object-identifier within the merge database ;
said transformation , by means of said merge database , comprising following basic operations : a PROJECTION operation , consisting of filling the table of the merge database from the source database and thereby changing said source identification (BI) , said source reference (BR) , said status identifier (ST) , and said attributes (GA , RA) , by said merge database , in conformity to the different data models of the respective databases ;
a GROUPING operation , consisting of determining the target object identifier of each data record and thereby changing said target identifier (TI) and reading said status identifier (ST) and said grouping attribute ;
a RELATING operation , consisting of determining the target object identifier of each related data item and thereby reading said target object identifier (TI) , said status identifier and said relational attribute (RA) , while changing sand target object reference (TR) ;
and optionally a CALCULATING operation , consisting of an evaluation of each attribute of a table of the merge database .

US8190610B2
CLAIM 15
. The method of claim 13 , wherein : the reducing step is carried out by a plurality of reducers ;

and the data that is not intermediate data (intermediate data) , for a particular reducer , includes data that is associated with that reducer .
US6678691B1
CLAIM 1
. A system for generating corporate information such as customer information and the like and providing said corporate information to system users , said corporate information originating from various automation systems which contain information on partly the same objects and being displayed as originating from one source thus providing an integrated image , said system comprising means for transforming the contents of a plurality of source data bases , each of them being loaded with legacy data records which are modelled according to a data model specific for that source database , to a target database having its own specific target data model , based upon the corporate data model , and said system comprising an intermediate merge database , including control means , between said source databases and said target database , said merge database having an intermediate data (intermediate data, computer system) structure , and being filled with data from said source database , while said data are re-grouped and re-structured according to said intermediate data structure ;
said target database being filled from said merge database with said re-grouped and re-structured data , according to said target data model , said target model being enabled to extract data by said system users via user front-end applications ;
said intermediate data structure of said merge database comprising following structure elements ;
a source identification (BI) indicating the origin of data records within the source database ;
a source reference (BR) indicating the origin of related data items in the source database ;
a target object identifier (TI) indicating a target object for each of said data record ;
a target object reference (TR) indicating a target object for each of said related data items ;
a status identifier (ST) indicating if data is current ;
and grouping attributes (GA) indicating fields into which data records can be divided in the case of a target object , and relational attributes (RA) indicating fields with which data records are related to a target object-identifier within the merge database ;
said transformation , by means of said merge database , comprising following basic operations : a PROJECTION operation , consisting of filling the table of the merge database from the source database and thereby changing said source identification (BI) , said source reference (BR) , said status identifier (ST) , and said attributes (GA , RA) , by said merge database , in conformity to the different data models of the respective databases ;
a GROUPING operation , consisting of determining the target object identifier of each data record and thereby changing said target identifier (TI) and reading said status identifier (ST) and said grouping attribute ;
a RELATING operation , consisting of determining the target object identifier of each related data item and thereby reading said target object identifier (TI) , said status identifier and said relational attribute (RA) , while changing sand target object reference (TR) ;
and optionally a CALCULATING operation , consisting of an evaluation of each attribute of a table of the merge database .

US8190610B2
CLAIM 16
. The method of claim 1 wherein the reducing step includes relating the data among the plurality of data groups (system users) .
US6678691B1
CLAIM 1
. A system for generating corporate information such as customer information and the like and providing said corporate information to system users (data groups) , said corporate information originating from various automation systems which contain information on partly the same objects and being displayed as originating from one source thus providing an integrated image , said system comprising means for transforming the contents of a plurality of source data bases , each of them being loaded with legacy data records which are modelled according to a data model specific for that source database , to a target database having its own specific target data model , based upon the corporate data model , and said system comprising an intermediate merge database , including control means , between said source databases and said target database , said merge database having an intermediate data structure , and being filled with data from said source database , while said data are re-grouped and re-structured according to said intermediate data structure ;
said target database being filled from said merge database with said re-grouped and re-structured data , according to said target data model , said target model being enabled to extract data by said system users via user front-end applications ;
said intermediate data structure of said merge database comprising following structure elements ;
a source identification (BI) indicating the origin of data records within the source database ;
a source reference (BR) indicating the origin of related data items in the source database ;
a target object identifier (TI) indicating a target object for each of said data record ;
a target object reference (TR) indicating a target object for each of said related data items ;
a status identifier (ST) indicating if data is current ;
and grouping attributes (GA) indicating fields into which data records can be divided in the case of a target object , and relational attributes (RA) indicating fields with which data records are related to a target object-identifier within the merge database ;
said transformation , by means of said merge database , comprising following basic operations : a PROJECTION operation , consisting of filling the table of the merge database from the source database and thereby changing said source identification (BI) , said source reference (BR) , said status identifier (ST) , and said attributes (GA , RA) , by said merge database , in conformity to the different data models of the respective databases ;
a GROUPING operation , consisting of determining the target object identifier of each data record and thereby changing said target identifier (TI) and reading said status identifier (ST) and said grouping attribute ;
a RELATING operation , consisting of determining the target object identifier of each related data item and thereby reading said target object identifier (TI) , said status identifier and said relational attribute (RA) , while changing sand target object reference (TR) ;
and optionally a CALCULATING operation , consisting of an evaluation of each attribute of a table of the merge database .

US8190610B2
CLAIM 17
. A computer system (intermediate data) including a plurality of computing devices , the computer system configured to process data of a data set (said system) , wherein the data set comprises a plurality of data groups (system users) , the computer system comprises at least one processor and memory that are operable to perform the following operations : partitioning the data of each one of the data groups into a plurality of data partitions that each have a plurality of key-value pairs and providing each data partition to a selected one of a plurality of mapping functions that are each user-configurable to independently output a plurality of lists of values for each of a set of keys found in such map function' ;

s corresponding data partition to form corresponding intermediate data (intermediate data) for that data group and identifiable to that data group , wherein the data of a first data group has a different schema (data model) than the data of a second data group and the data of the first data group is mapped differently than the data of the second data group so that different lists of values are output for the corresponding different intermediate data , wherein the different schema and corresponding different intermediate data have a key in common ;

and reduce the intermediate data for the data groups to at least one output data group , including processing the intermediate data for each data group in a manner that is defined to correspond to that data group so as to result in a merging of the corresponding different intermediate data based on the key in common .
US6678691B1
CLAIM 1
. A system for generating corporate information such as customer information and the like and providing said corporate information to system users (data groups) , said corporate information originating from various automation systems which contain information on partly the same objects and being displayed as originating from one source thus providing an integrated image , said system (data set, first data set, second data set) comprising means for transforming the contents of a plurality of source data bases , each of them being loaded with legacy data records which are modelled according to a data model (different schema) specific for that source database , to a target database having its own specific target data model , based upon the corporate data model , and said system comprising an intermediate merge database , including control means , between said source databases and said target database , said merge database having an intermediate data (intermediate data, computer system) structure , and being filled with data from said source database , while said data are re-grouped and re-structured according to said intermediate data structure ;
said target database being filled from said merge database with said re-grouped and re-structured data , according to said target data model , said target model being enabled to extract data by said system users via user front-end applications ;
said intermediate data structure of said merge database comprising following structure elements ;
a source identification (BI) indicating the origin of data records within the source database ;
a source reference (BR) indicating the origin of related data items in the source database ;
a target object identifier (TI) indicating a target object for each of said data record ;
a target object reference (TR) indicating a target object for each of said related data items ;
a status identifier (ST) indicating if data is current ;
and grouping attributes (GA) indicating fields into which data records can be divided in the case of a target object , and relational attributes (RA) indicating fields with which data records are related to a target object-identifier within the merge database ;
said transformation , by means of said merge database , comprising following basic operations : a PROJECTION operation , consisting of filling the table of the merge database from the source database and thereby changing said source identification (BI) , said source reference (BR) , said status identifier (ST) , and said attributes (GA , RA) , by said merge database , in conformity to the different data models of the respective databases ;
a GROUPING operation , consisting of determining the target object identifier of each data record and thereby changing said target identifier (TI) and reading said status identifier (ST) and said grouping attribute ;
a RELATING operation , consisting of determining the target object identifier of each related data item and thereby reading said target object identifier (TI) , said status identifier and said relational attribute (RA) , while changing sand target object reference (TR) ;
and optionally a CALCULATING operation , consisting of an evaluation of each attribute of a table of the merge database .

US8190610B2
CLAIM 18
. The computer system (intermediate data) of claim 17 , wherein : the at least one output data group is a plurality of output data groups (system users) .
US6678691B1
CLAIM 1
. A system for generating corporate information such as customer information and the like and providing said corporate information to system users (data groups) , said corporate information originating from various automation systems which contain information on partly the same objects and being displayed as originating from one source thus providing an integrated image , said system comprising means for transforming the contents of a plurality of source data bases , each of them being loaded with legacy data records which are modelled according to a data model specific for that source database , to a target database having its own specific target data model , based upon the corporate data model , and said system comprising an intermediate merge database , including control means , between said source databases and said target database , said merge database having an intermediate data (intermediate data, computer system) structure , and being filled with data from said source database , while said data are re-grouped and re-structured according to said intermediate data structure ;
said target database being filled from said merge database with said re-grouped and re-structured data , according to said target data model , said target model being enabled to extract data by said system users via user front-end applications ;
said intermediate data structure of said merge database comprising following structure elements ;
a source identification (BI) indicating the origin of data records within the source database ;
a source reference (BR) indicating the origin of related data items in the source database ;
a target object identifier (TI) indicating a target object for each of said data record ;
a target object reference (TR) indicating a target object for each of said related data items ;
a status identifier (ST) indicating if data is current ;
and grouping attributes (GA) indicating fields into which data records can be divided in the case of a target object , and relational attributes (RA) indicating fields with which data records are related to a target object-identifier within the merge database ;
said transformation , by means of said merge database , comprising following basic operations : a PROJECTION operation , consisting of filling the table of the merge database from the source database and thereby changing said source identification (BI) , said source reference (BR) , said status identifier (ST) , and said attributes (GA , RA) , by said merge database , in conformity to the different data models of the respective databases ;
a GROUPING operation , consisting of determining the target object identifier of each data record and thereby changing said target identifier (TI) and reading said status identifier (ST) and said grouping attribute ;
a RELATING operation , consisting of determining the target object identifier of each related data item and thereby reading said target object identifier (TI) , said status identifier and said relational attribute (RA) , while changing sand target object reference (TR) ;
and optionally a CALCULATING operation , consisting of an evaluation of each attribute of a table of the merge database .

US8190610B2
CLAIM 19
. The computer system (intermediate data) of claim 17 , wherein : corresponding intermediate data (intermediate data) for a data group being identifiable to the data group includes the corresponding intermediate data being stored such that metadata associated with the manner of storage is identifiable to the data group .
US6678691B1
CLAIM 1
. A system for generating corporate information such as customer information and the like and providing said corporate information to system users , said corporate information originating from various automation systems which contain information on partly the same objects and being displayed as originating from one source thus providing an integrated image , said system comprising means for transforming the contents of a plurality of source data bases , each of them being loaded with legacy data records which are modelled according to a data model specific for that source database , to a target database having its own specific target data model , based upon the corporate data model , and said system comprising an intermediate merge database , including control means , between said source databases and said target database , said merge database having an intermediate data (intermediate data, computer system) structure , and being filled with data from said source database , while said data are re-grouped and re-structured according to said intermediate data structure ;
said target database being filled from said merge database with said re-grouped and re-structured data , according to said target data model , said target model being enabled to extract data by said system users via user front-end applications ;
said intermediate data structure of said merge database comprising following structure elements ;
a source identification (BI) indicating the origin of data records within the source database ;
a source reference (BR) indicating the origin of related data items in the source database ;
a target object identifier (TI) indicating a target object for each of said data record ;
a target object reference (TR) indicating a target object for each of said related data items ;
a status identifier (ST) indicating if data is current ;
and grouping attributes (GA) indicating fields into which data records can be divided in the case of a target object , and relational attributes (RA) indicating fields with which data records are related to a target object-identifier within the merge database ;
said transformation , by means of said merge database , comprising following basic operations : a PROJECTION operation , consisting of filling the table of the merge database from the source database and thereby changing said source identification (BI) , said source reference (BR) , said status identifier (ST) , and said attributes (GA , RA) , by said merge database , in conformity to the different data models of the respective databases ;
a GROUPING operation , consisting of determining the target object identifier of each data record and thereby changing said target identifier (TI) and reading said status identifier (ST) and said grouping attribute ;
a RELATING operation , consisting of determining the target object identifier of each related data item and thereby reading said target object identifier (TI) , said status identifier and said relational attribute (RA) , while changing sand target object reference (TR) ;
and optionally a CALCULATING operation , consisting of an evaluation of each attribute of a table of the merge database .

US8190610B2
CLAIM 20
. The computer system (intermediate data) of claim 19 , wherein : corresponding intermediate data (intermediate data) for a data group being identifiable to the data group includes the computer system being configured to store the corresponding intermediate data in association with data that is identifiable to the data group .
US6678691B1
CLAIM 1
. A system for generating corporate information such as customer information and the like and providing said corporate information to system users , said corporate information originating from various automation systems which contain information on partly the same objects and being displayed as originating from one source thus providing an integrated image , said system comprising means for transforming the contents of a plurality of source data bases , each of them being loaded with legacy data records which are modelled according to a data model specific for that source database , to a target database having its own specific target data model , based upon the corporate data model , and said system comprising an intermediate merge database , including control means , between said source databases and said target database , said merge database having an intermediate data (intermediate data, computer system) structure , and being filled with data from said source database , while said data are re-grouped and re-structured according to said intermediate data structure ;
said target database being filled from said merge database with said re-grouped and re-structured data , according to said target data model , said target model being enabled to extract data by said system users via user front-end applications ;
said intermediate data structure of said merge database comprising following structure elements ;
a source identification (BI) indicating the origin of data records within the source database ;
a source reference (BR) indicating the origin of related data items in the source database ;
a target object identifier (TI) indicating a target object for each of said data record ;
a target object reference (TR) indicating a target object for each of said related data items ;
a status identifier (ST) indicating if data is current ;
and grouping attributes (GA) indicating fields into which data records can be divided in the case of a target object , and relational attributes (RA) indicating fields with which data records are related to a target object-identifier within the merge database ;
said transformation , by means of said merge database , comprising following basic operations : a PROJECTION operation , consisting of filling the table of the merge database from the source database and thereby changing said source identification (BI) , said source reference (BR) , said status identifier (ST) , and said attributes (GA , RA) , by said merge database , in conformity to the different data models of the respective databases ;
a GROUPING operation , consisting of determining the target object identifier of each data record and thereby changing said target identifier (TI) and reading said status identifier (ST) and said grouping attribute ;
a RELATING operation , consisting of determining the target object identifier of each related data item and thereby reading said target object identifier (TI) , said status identifier and said relational attribute (RA) , while changing sand target object reference (TR) ;
and optionally a CALCULATING operation , consisting of an evaluation of each attribute of a table of the merge database .

US8190610B2
CLAIM 21
. The computer system (intermediate data) of claim 17 , wherein : the at least one processor and memory being configured to process the intermediate data (intermediate data) for each data group in a manner that is defined to correspond to that data group includes , for each data group , the at least one processor and memory being configured to employ an iterator that corresponds to that data group .
US6678691B1
CLAIM 1
. A system for generating corporate information such as customer information and the like and providing said corporate information to system users , said corporate information originating from various automation systems which contain information on partly the same objects and being displayed as originating from one source thus providing an integrated image , said system comprising means for transforming the contents of a plurality of source data bases , each of them being loaded with legacy data records which are modelled according to a data model specific for that source database , to a target database having its own specific target data model , based upon the corporate data model , and said system comprising an intermediate merge database , including control means , between said source databases and said target database , said merge database having an intermediate data (intermediate data, computer system) structure , and being filled with data from said source database , while said data are re-grouped and re-structured according to said intermediate data structure ;
said target database being filled from said merge database with said re-grouped and re-structured data , according to said target data model , said target model being enabled to extract data by said system users via user front-end applications ;
said intermediate data structure of said merge database comprising following structure elements ;
a source identification (BI) indicating the origin of data records within the source database ;
a source reference (BR) indicating the origin of related data items in the source database ;
a target object identifier (TI) indicating a target object for each of said data record ;
a target object reference (TR) indicating a target object for each of said related data items ;
a status identifier (ST) indicating if data is current ;
and grouping attributes (GA) indicating fields into which data records can be divided in the case of a target object , and relational attributes (RA) indicating fields with which data records are related to a target object-identifier within the merge database ;
said transformation , by means of said merge database , comprising following basic operations : a PROJECTION operation , consisting of filling the table of the merge database from the source database and thereby changing said source identification (BI) , said source reference (BR) , said status identifier (ST) , and said attributes (GA , RA) , by said merge database , in conformity to the different data models of the respective databases ;
a GROUPING operation , consisting of determining the target object identifier of each data record and thereby changing said target identifier (TI) and reading said status identifier (ST) and said grouping attribute ;
a RELATING operation , consisting of determining the target object identifier of each related data item and thereby reading said target object identifier (TI) , said status identifier and said relational attribute (RA) , while changing sand target object reference (TR) ;
and optionally a CALCULATING operation , consisting of an evaluation of each attribute of a table of the merge database .

US8190610B2
CLAIM 22
. The computer system (intermediate data) of claim 21 , wherein : the intermediate data (intermediate data) includes a plurality of grouped sets of key/value pairs ;

the at least one processor and memory are further operable to reduce the intermediate data via a plurality of reducers ;

for at least one of the reducers , the iterator corresponding to a particular data (one source) group , for that reducer , is configured to operate according to a different key of a different schema (data model) than the iterator corresponding to another particular data group , for that reducer .
US6678691B1
CLAIM 1
. A system for generating corporate information such as customer information and the like and providing said corporate information to system users , said corporate information originating from various automation systems which contain information on partly the same objects and being displayed as originating from one source (particular data, particular data group) thus providing an integrated image , said system comprising means for transforming the contents of a plurality of source data bases , each of them being loaded with legacy data records which are modelled according to a data model (different schema) specific for that source database , to a target database having its own specific target data model , based upon the corporate data model , and said system comprising an intermediate merge database , including control means , between said source databases and said target database , said merge database having an intermediate data (intermediate data, computer system) structure , and being filled with data from said source database , while said data are re-grouped and re-structured according to said intermediate data structure ;
said target database being filled from said merge database with said re-grouped and re-structured data , according to said target data model , said target model being enabled to extract data by said system users via user front-end applications ;
said intermediate data structure of said merge database comprising following structure elements ;
a source identification (BI) indicating the origin of data records within the source database ;
a source reference (BR) indicating the origin of related data items in the source database ;
a target object identifier (TI) indicating a target object for each of said data record ;
a target object reference (TR) indicating a target object for each of said related data items ;
a status identifier (ST) indicating if data is current ;
and grouping attributes (GA) indicating fields into which data records can be divided in the case of a target object , and relational attributes (RA) indicating fields with which data records are related to a target object-identifier within the merge database ;
said transformation , by means of said merge database , comprising following basic operations : a PROJECTION operation , consisting of filling the table of the merge database from the source database and thereby changing said source identification (BI) , said source reference (BR) , said status identifier (ST) , and said attributes (GA , RA) , by said merge database , in conformity to the different data models of the respective databases ;
a GROUPING operation , consisting of determining the target object identifier of each data record and thereby changing said target identifier (TI) and reading said status identifier (ST) and said grouping attribute ;
a RELATING operation , consisting of determining the target object identifier of each related data item and thereby reading said target object identifier (TI) , said status identifier and said relational attribute (RA) , while changing sand target object reference (TR) ;
and optionally a CALCULATING operation , consisting of an evaluation of each attribute of a table of the merge database .

US8190610B2
CLAIM 23
. The computer system (intermediate data) of claim 17 , wherein : the intermediate data (intermediate data) includes a plurality of grouped sets of key/value pairs ;

the at least one processor and memory are further operable to reduce the intermediate data via a plurality of reducers ;

and the at least one processor and memory are further operable to : partition the intermediate data into a plurality of partitions , to provide each key/value pair of the intermediate data to a separate one of the partitions ;

and provide the intermediate data of each partition to a separate one of the reducers .
US6678691B1
CLAIM 1
. A system for generating corporate information such as customer information and the like and providing said corporate information to system users , said corporate information originating from various automation systems which contain information on partly the same objects and being displayed as originating from one source thus providing an integrated image , said system comprising means for transforming the contents of a plurality of source data bases , each of them being loaded with legacy data records which are modelled according to a data model specific for that source database , to a target database having its own specific target data model , based upon the corporate data model , and said system comprising an intermediate merge database , including control means , between said source databases and said target database , said merge database having an intermediate data (intermediate data, computer system) structure , and being filled with data from said source database , while said data are re-grouped and re-structured according to said intermediate data structure ;
said target database being filled from said merge database with said re-grouped and re-structured data , according to said target data model , said target model being enabled to extract data by said system users via user front-end applications ;
said intermediate data structure of said merge database comprising following structure elements ;
a source identification (BI) indicating the origin of data records within the source database ;
a source reference (BR) indicating the origin of related data items in the source database ;
a target object identifier (TI) indicating a target object for each of said data record ;
a target object reference (TR) indicating a target object for each of said related data items ;
a status identifier (ST) indicating if data is current ;
and grouping attributes (GA) indicating fields into which data records can be divided in the case of a target object , and relational attributes (RA) indicating fields with which data records are related to a target object-identifier within the merge database ;
said transformation , by means of said merge database , comprising following basic operations : a PROJECTION operation , consisting of filling the table of the merge database from the source database and thereby changing said source identification (BI) , said source reference (BR) , said status identifier (ST) , and said attributes (GA , RA) , by said merge database , in conformity to the different data models of the respective databases ;
a GROUPING operation , consisting of determining the target object identifier of each data record and thereby changing said target identifier (TI) and reading said status identifier (ST) and said grouping attribute ;
a RELATING operation , consisting of determining the target object identifier of each related data item and thereby reading said target object identifier (TI) , said status identifier and said relational attribute (RA) , while changing sand target object reference (TR) ;
and optionally a CALCULATING operation , consisting of an evaluation of each attribute of a table of the merge database .

US8190610B2
CLAIM 24
. The computer system (intermediate data) of claim 17 , wherein : the intermediate data (intermediate data) includes a plurality of grouped sets of key/value pairs ;

the at least one processor and memory are further operable to reduce the intermediate data via a plurality of reducers ;

and the at least one processor and memory are further operable to : partition the intermediate data into a plurality of partitions , to provide at least some of the key/value pairs of the intermediate data to more than one of the partitions ;

and provide the intermediate data of each partition to a separate one of the reducers .
US6678691B1
CLAIM 1
. A system for generating corporate information such as customer information and the like and providing said corporate information to system users , said corporate information originating from various automation systems which contain information on partly the same objects and being displayed as originating from one source thus providing an integrated image , said system comprising means for transforming the contents of a plurality of source data bases , each of them being loaded with legacy data records which are modelled according to a data model specific for that source database , to a target database having its own specific target data model , based upon the corporate data model , and said system comprising an intermediate merge database , including control means , between said source databases and said target database , said merge database having an intermediate data (intermediate data, computer system) structure , and being filled with data from said source database , while said data are re-grouped and re-structured according to said intermediate data structure ;
said target database being filled from said merge database with said re-grouped and re-structured data , according to said target data model , said target model being enabled to extract data by said system users via user front-end applications ;
said intermediate data structure of said merge database comprising following structure elements ;
a source identification (BI) indicating the origin of data records within the source database ;
a source reference (BR) indicating the origin of related data items in the source database ;
a target object identifier (TI) indicating a target object for each of said data record ;
a target object reference (TR) indicating a target object for each of said related data items ;
a status identifier (ST) indicating if data is current ;
and grouping attributes (GA) indicating fields into which data records can be divided in the case of a target object , and relational attributes (RA) indicating fields with which data records are related to a target object-identifier within the merge database ;
said transformation , by means of said merge database , comprising following basic operations : a PROJECTION operation , consisting of filling the table of the merge database from the source database and thereby changing said source identification (BI) , said source reference (BR) , said status identifier (ST) , and said attributes (GA , RA) , by said merge database , in conformity to the different data models of the respective databases ;
a GROUPING operation , consisting of determining the target object identifier of each data record and thereby changing said target identifier (TI) and reading said status identifier (ST) and said grouping attribute ;
a RELATING operation , consisting of determining the target object identifier of each related data item and thereby reading said target object identifier (TI) , said status identifier and said relational attribute (RA) , while changing sand target object reference (TR) ;
and optionally a CALCULATING operation , consisting of an evaluation of each attribute of a table of the merge database .

US8190610B2
CLAIM 25
. The computer system (intermediate data) of claim 24 , wherein : the at least one processor and memory are further operable to , in partitioning , to provide all of the key/value pairs of the intermediate data (intermediate data) to all of the partitions .
US6678691B1
CLAIM 1
. A system for generating corporate information such as customer information and the like and providing said corporate information to system users , said corporate information originating from various automation systems which contain information on partly the same objects and being displayed as originating from one source thus providing an integrated image , said system comprising means for transforming the contents of a plurality of source data bases , each of them being loaded with legacy data records which are modelled according to a data model specific for that source database , to a target database having its own specific target data model , based upon the corporate data model , and said system comprising an intermediate merge database , including control means , between said source databases and said target database , said merge database having an intermediate data (intermediate data, computer system) structure , and being filled with data from said source database , while said data are re-grouped and re-structured according to said intermediate data structure ;
said target database being filled from said merge database with said re-grouped and re-structured data , according to said target data model , said target model being enabled to extract data by said system users via user front-end applications ;
said intermediate data structure of said merge database comprising following structure elements ;
a source identification (BI) indicating the origin of data records within the source database ;
a source reference (BR) indicating the origin of related data items in the source database ;
a target object identifier (TI) indicating a target object for each of said data record ;
a target object reference (TR) indicating a target object for each of said related data items ;
a status identifier (ST) indicating if data is current ;
and grouping attributes (GA) indicating fields into which data records can be divided in the case of a target object , and relational attributes (RA) indicating fields with which data records are related to a target object-identifier within the merge database ;
said transformation , by means of said merge database , comprising following basic operations : a PROJECTION operation , consisting of filling the table of the merge database from the source database and thereby changing said source identification (BI) , said source reference (BR) , said status identifier (ST) , and said attributes (GA , RA) , by said merge database , in conformity to the different data models of the respective databases ;
a GROUPING operation , consisting of determining the target object identifier of each data record and thereby changing said target identifier (TI) and reading said status identifier (ST) and said grouping attribute ;
a RELATING operation , consisting of determining the target object identifier of each related data item and thereby reading said target object identifier (TI) , said status identifier and said relational attribute (RA) , while changing sand target object reference (TR) ;
and optionally a CALCULATING operation , consisting of an evaluation of each attribute of a table of the merge database .

US8190610B2
CLAIM 26
. The computer system (intermediate data) of claim 24 , wherein : at least some of the reducers include a sort , group-by-key and combine task ;

the at least one processor and memory are further operable to generate and provide metadata for at least some of the mapping , partitioning , combining , grouping and sorting .
US6678691B1
CLAIM 1
. A system for generating corporate information such as customer information and the like and providing said corporate information to system users , said corporate information originating from various automation systems which contain information on partly the same objects and being displayed as originating from one source thus providing an integrated image , said system comprising means for transforming the contents of a plurality of source data bases , each of them being loaded with legacy data records which are modelled according to a data model specific for that source database , to a target database having its own specific target data model , based upon the corporate data model , and said system comprising an intermediate merge database , including control means , between said source databases and said target database , said merge database having an intermediate data (intermediate data, computer system) structure , and being filled with data from said source database , while said data are re-grouped and re-structured according to said intermediate data structure ;
said target database being filled from said merge database with said re-grouped and re-structured data , according to said target data model , said target model being enabled to extract data by said system users via user front-end applications ;
said intermediate data structure of said merge database comprising following structure elements ;
a source identification (BI) indicating the origin of data records within the source database ;
a source reference (BR) indicating the origin of related data items in the source database ;
a target object identifier (TI) indicating a target object for each of said data record ;
a target object reference (TR) indicating a target object for each of said related data items ;
a status identifier (ST) indicating if data is current ;
and grouping attributes (GA) indicating fields into which data records can be divided in the case of a target object , and relational attributes (RA) indicating fields with which data records are related to a target object-identifier within the merge database ;
said transformation , by means of said merge database , comprising following basic operations : a PROJECTION operation , consisting of filling the table of the merge database from the source database and thereby changing said source identification (BI) , said source reference (BR) , said status identifier (ST) , and said attributes (GA , RA) , by said merge database , in conformity to the different data models of the respective databases ;
a GROUPING operation , consisting of determining the target object identifier of each data record and thereby changing said target identifier (TI) and reading said status identifier (ST) and said grouping attribute ;
a RELATING operation , consisting of determining the target object identifier of each related data item and thereby reading said target object identifier (TI) , said status identifier and said relational attribute (RA) , while changing sand target object reference (TR) ;
and optionally a CALCULATING operation , consisting of an evaluation of each attribute of a table of the merge database .

US8190610B2
CLAIM 27
. The computer system (intermediate data) of claim 26 , wherein : the reducing includes processing the metadata .
US6678691B1
CLAIM 1
. A system for generating corporate information such as customer information and the like and providing said corporate information to system users , said corporate information originating from various automation systems which contain information on partly the same objects and being displayed as originating from one source thus providing an integrated image , said system comprising means for transforming the contents of a plurality of source data bases , each of them being loaded with legacy data records which are modelled according to a data model specific for that source database , to a target database having its own specific target data model , based upon the corporate data model , and said system comprising an intermediate merge database , including control means , between said source databases and said target database , said merge database having an intermediate data (intermediate data, computer system) structure , and being filled with data from said source database , while said data are re-grouped and re-structured according to said intermediate data structure ;
said target database being filled from said merge database with said re-grouped and re-structured data , according to said target data model , said target model being enabled to extract data by said system users via user front-end applications ;
said intermediate data structure of said merge database comprising following structure elements ;
a source identification (BI) indicating the origin of data records within the source database ;
a source reference (BR) indicating the origin of related data items in the source database ;
a target object identifier (TI) indicating a target object for each of said data record ;
a target object reference (TR) indicating a target object for each of said related data items ;
a status identifier (ST) indicating if data is current ;
and grouping attributes (GA) indicating fields into which data records can be divided in the case of a target object , and relational attributes (RA) indicating fields with which data records are related to a target object-identifier within the merge database ;
said transformation , by means of said merge database , comprising following basic operations : a PROJECTION operation , consisting of filling the table of the merge database from the source database and thereby changing said source identification (BI) , said source reference (BR) , said status identifier (ST) , and said attributes (GA , RA) , by said merge database , in conformity to the different data models of the respective databases ;
a GROUPING operation , consisting of determining the target object identifier of each data record and thereby changing said target identifier (TI) and reading said status identifier (ST) and said grouping attribute ;
a RELATING operation , consisting of determining the target object identifier of each related data item and thereby reading said target object identifier (TI) , said status identifier and said relational attribute (RA) , while changing sand target object reference (TR) ;
and optionally a CALCULATING operation , consisting of an evaluation of each attribute of a table of the merge database .

US8190610B2
CLAIM 28
. The computer system (intermediate data) of claim 27 , wherein : the at least one processor and memory being further operable to process the intermediate data (intermediate data) for each data group in a manner that is defined to correspond to that data group includes , for each data group , the at least one processor and memory being further operable to employ an iterator that corresponds to that data group , wherein the iterator is configured to provide the associated metadata to the reducing .
US6678691B1
CLAIM 1
. A system for generating corporate information such as customer information and the like and providing said corporate information to system users , said corporate information originating from various automation systems which contain information on partly the same objects and being displayed as originating from one source thus providing an integrated image , said system comprising means for transforming the contents of a plurality of source data bases , each of them being loaded with legacy data records which are modelled according to a data model specific for that source database , to a target database having its own specific target data model , based upon the corporate data model , and said system comprising an intermediate merge database , including control means , between said source databases and said target database , said merge database having an intermediate data (intermediate data, computer system) structure , and being filled with data from said source database , while said data are re-grouped and re-structured according to said intermediate data structure ;
said target database being filled from said merge database with said re-grouped and re-structured data , according to said target data model , said target model being enabled to extract data by said system users via user front-end applications ;
said intermediate data structure of said merge database comprising following structure elements ;
a source identification (BI) indicating the origin of data records within the source database ;
a source reference (BR) indicating the origin of related data items in the source database ;
a target object identifier (TI) indicating a target object for each of said data record ;
a target object reference (TR) indicating a target object for each of said related data items ;
a status identifier (ST) indicating if data is current ;
and grouping attributes (GA) indicating fields into which data records can be divided in the case of a target object , and relational attributes (RA) indicating fields with which data records are related to a target object-identifier within the merge database ;
said transformation , by means of said merge database , comprising following basic operations : a PROJECTION operation , consisting of filling the table of the merge database from the source database and thereby changing said source identification (BI) , said source reference (BR) , said status identifier (ST) , and said attributes (GA , RA) , by said merge database , in conformity to the different data models of the respective databases ;
a GROUPING operation , consisting of determining the target object identifier of each data record and thereby changing said target identifier (TI) and reading said status identifier (ST) and said grouping attribute ;
a RELATING operation , consisting of determining the target object identifier of each related data item and thereby reading said target object identifier (TI) , said status identifier and said relational attribute (RA) , while changing sand target object reference (TR) ;
and optionally a CALCULATING operation , consisting of an evaluation of each attribute of a table of the merge database .

US8190610B2
CLAIM 29
. The computer system (intermediate data) of claim 17 , wherein : the intermediate data (intermediate data) processing of the reducing further comprises processing data that is not intermediate data .
US6678691B1
CLAIM 1
. A system for generating corporate information such as customer information and the like and providing said corporate information to system users , said corporate information originating from various automation systems which contain information on partly the same objects and being displayed as originating from one source thus providing an integrated image , said system comprising means for transforming the contents of a plurality of source data bases , each of them being loaded with legacy data records which are modelled according to a data model specific for that source database , to a target database having its own specific target data model , based upon the corporate data model , and said system comprising an intermediate merge database , including control means , between said source databases and said target database , said merge database having an intermediate data (intermediate data, computer system) structure , and being filled with data from said source database , while said data are re-grouped and re-structured according to said intermediate data structure ;
said target database being filled from said merge database with said re-grouped and re-structured data , according to said target data model , said target model being enabled to extract data by said system users via user front-end applications ;
said intermediate data structure of said merge database comprising following structure elements ;
a source identification (BI) indicating the origin of data records within the source database ;
a source reference (BR) indicating the origin of related data items in the source database ;
a target object identifier (TI) indicating a target object for each of said data record ;
a target object reference (TR) indicating a target object for each of said related data items ;
a status identifier (ST) indicating if data is current ;
and grouping attributes (GA) indicating fields into which data records can be divided in the case of a target object , and relational attributes (RA) indicating fields with which data records are related to a target object-identifier within the merge database ;
said transformation , by means of said merge database , comprising following basic operations : a PROJECTION operation , consisting of filling the table of the merge database from the source database and thereby changing said source identification (BI) , said source reference (BR) , said status identifier (ST) , and said attributes (GA , RA) , by said merge database , in conformity to the different data models of the respective databases ;
a GROUPING operation , consisting of determining the target object identifier of each data record and thereby changing said target identifier (TI) and reading said status identifier (ST) and said grouping attribute ;
a RELATING operation , consisting of determining the target object identifier of each related data item and thereby reading said target object identifier (TI) , said status identifier and said relational attribute (RA) , while changing sand target object reference (TR) ;
and optionally a CALCULATING operation , consisting of an evaluation of each attribute of a table of the merge database .

US8190610B2
CLAIM 30
. The computer system (intermediate data) of claim 29 , wherein : the at least one processor and memory are further operable to reduce the intermediate data (intermediate data) via a plurality of reducers ;

and the data that is not intermediate data , for a particular reducer , includes data that is associated with another reducer .
US6678691B1
CLAIM 1
. A system for generating corporate information such as customer information and the like and providing said corporate information to system users , said corporate information originating from various automation systems which contain information on partly the same objects and being displayed as originating from one source thus providing an integrated image , said system comprising means for transforming the contents of a plurality of source data bases , each of them being loaded with legacy data records which are modelled according to a data model specific for that source database , to a target database having its own specific target data model , based upon the corporate data model , and said system comprising an intermediate merge database , including control means , between said source databases and said target database , said merge database having an intermediate data (intermediate data, computer system) structure , and being filled with data from said source database , while said data are re-grouped and re-structured according to said intermediate data structure ;
said target database being filled from said merge database with said re-grouped and re-structured data , according to said target data model , said target model being enabled to extract data by said system users via user front-end applications ;
said intermediate data structure of said merge database comprising following structure elements ;
a source identification (BI) indicating the origin of data records within the source database ;
a source reference (BR) indicating the origin of related data items in the source database ;
a target object identifier (TI) indicating a target object for each of said data record ;
a target object reference (TR) indicating a target object for each of said related data items ;
a status identifier (ST) indicating if data is current ;
and grouping attributes (GA) indicating fields into which data records can be divided in the case of a target object , and relational attributes (RA) indicating fields with which data records are related to a target object-identifier within the merge database ;
said transformation , by means of said merge database , comprising following basic operations : a PROJECTION operation , consisting of filling the table of the merge database from the source database and thereby changing said source identification (BI) , said source reference (BR) , said status identifier (ST) , and said attributes (GA , RA) , by said merge database , in conformity to the different data models of the respective databases ;
a GROUPING operation , consisting of determining the target object identifier of each data record and thereby changing said target identifier (TI) and reading said status identifier (ST) and said grouping attribute ;
a RELATING operation , consisting of determining the target object identifier of each related data item and thereby reading said target object identifier (TI) , said status identifier and said relational attribute (RA) , while changing sand target object reference (TR) ;
and optionally a CALCULATING operation , consisting of an evaluation of each attribute of a table of the merge database .

US8190610B2
CLAIM 31
. The computer system (intermediate data) of claim 29 , wherein : the at least one processor and memory are further operable to reduce the intermediate data (intermediate data) via a plurality of reducers ;

and the data that is not intermediate data , for a particular reducer , includes data that is associated with that reducer .
US6678691B1
CLAIM 1
. A system for generating corporate information such as customer information and the like and providing said corporate information to system users , said corporate information originating from various automation systems which contain information on partly the same objects and being displayed as originating from one source thus providing an integrated image , said system comprising means for transforming the contents of a plurality of source data bases , each of them being loaded with legacy data records which are modelled according to a data model specific for that source database , to a target database having its own specific target data model , based upon the corporate data model , and said system comprising an intermediate merge database , including control means , between said source databases and said target database , said merge database having an intermediate data (intermediate data, computer system) structure , and being filled with data from said source database , while said data are re-grouped and re-structured according to said intermediate data structure ;
said target database being filled from said merge database with said re-grouped and re-structured data , according to said target data model , said target model being enabled to extract data by said system users via user front-end applications ;
said intermediate data structure of said merge database comprising following structure elements ;
a source identification (BI) indicating the origin of data records within the source database ;
a source reference (BR) indicating the origin of related data items in the source database ;
a target object identifier (TI) indicating a target object for each of said data record ;
a target object reference (TR) indicating a target object for each of said related data items ;
a status identifier (ST) indicating if data is current ;
and grouping attributes (GA) indicating fields into which data records can be divided in the case of a target object , and relational attributes (RA) indicating fields with which data records are related to a target object-identifier within the merge database ;
said transformation , by means of said merge database , comprising following basic operations : a PROJECTION operation , consisting of filling the table of the merge database from the source database and thereby changing said source identification (BI) , said source reference (BR) , said status identifier (ST) , and said attributes (GA , RA) , by said merge database , in conformity to the different data models of the respective databases ;
a GROUPING operation , consisting of determining the target object identifier of each data record and thereby changing said target identifier (TI) and reading said status identifier (ST) and said grouping attribute ;
a RELATING operation , consisting of determining the target object identifier of each related data item and thereby reading said target object identifier (TI) , said status identifier and said relational attribute (RA) , while changing sand target object reference (TR) ;
and optionally a CALCULATING operation , consisting of an evaluation of each attribute of a table of the merge database .

US8190610B2
CLAIM 32
. The computer system (intermediate data) of claim 17 , wherein the at least one processor and memory are further operable to reduce by relating the data among the plurality of data groups (system users) .
US6678691B1
CLAIM 1
. A system for generating corporate information such as customer information and the like and providing said corporate information to system users (data groups) , said corporate information originating from various automation systems which contain information on partly the same objects and being displayed as originating from one source thus providing an integrated image , said system comprising means for transforming the contents of a plurality of source data bases , each of them being loaded with legacy data records which are modelled according to a data model specific for that source database , to a target database having its own specific target data model , based upon the corporate data model , and said system comprising an intermediate merge database , including control means , between said source databases and said target database , said merge database having an intermediate data (intermediate data, computer system) structure , and being filled with data from said source database , while said data are re-grouped and re-structured according to said intermediate data structure ;
said target database being filled from said merge database with said re-grouped and re-structured data , according to said target data model , said target model being enabled to extract data by said system users via user front-end applications ;
said intermediate data structure of said merge database comprising following structure elements ;
a source identification (BI) indicating the origin of data records within the source database ;
a source reference (BR) indicating the origin of related data items in the source database ;
a target object identifier (TI) indicating a target object for each of said data record ;
a target object reference (TR) indicating a target object for each of said related data items ;
a status identifier (ST) indicating if data is current ;
and grouping attributes (GA) indicating fields into which data records can be divided in the case of a target object , and relational attributes (RA) indicating fields with which data records are related to a target object-identifier within the merge database ;
said transformation , by means of said merge database , comprising following basic operations : a PROJECTION operation , consisting of filling the table of the merge database from the source database and thereby changing said source identification (BI) , said source reference (BR) , said status identifier (ST) , and said attributes (GA , RA) , by said merge database , in conformity to the different data models of the respective databases ;
a GROUPING operation , consisting of determining the target object identifier of each data record and thereby changing said target identifier (TI) and reading said status identifier (ST) and said grouping attribute ;
a RELATING operation , consisting of determining the target object identifier of each related data item and thereby reading said target object identifier (TI) , said status identifier and said relational attribute (RA) , while changing sand target object reference (TR) ;
and optionally a CALCULATING operation , consisting of an evaluation of each attribute of a table of the merge database .

US8190610B2
CLAIM 33
. A map-reduce method of processing data from a plurality of groups having different schema (data model) over a computer system (intermediate data) , the method comprising : for a first data set (said system) having a plurality of first key-value pairs , wherein such first data set belongs to a first data group and the first key-value pairs have a first schema , partitioning the first data set into a plurality of data partitions and providing each data partition to a selected one of a plurality of mapping functions that are each user-configurable to independently output a plurality of lists of values for each of a set of keys found in such map function' ;

s corresponding data partition to form a first intermediate data (intermediate data) set having a first set of resulting key-value pairs ;

for a second data set having a plurality of second key-value pairs , wherein such second data set belongs to a second data group and the second key-value pairs have a second schema (structured data) , partitioning the first data set into a plurality of data partitions and providing each data partition to a selected one of a plurality of mapping functions to form a second intermediate data set having a second set (structured data) of resulting key-value pairs differing from the first set of resulting key-value pairs , wherein the first schema differs from the second schema , wherein the first and second schema and the first and second set of resulting key-value pairs have a key in common ;

and reducing the first and the second intermediate data set together so as to form an output data set , wherein the reducing is accomplished by iterating on at least one key from the first and second intermediate data set , and the output data set has a different schema than the first and second schema , wherein the mapping and reducing operations are performed by a distributed system .
US6678691B1
CLAIM 1
. A system for generating corporate information such as customer information and the like and providing said corporate information to system users , said corporate information originating from various automation systems which contain information on partly the same objects and being displayed as originating from one source thus providing an integrated image , said system (data set, first data set, second data set) comprising means for transforming the contents of a plurality of source data bases , each of them being loaded with legacy data records which are modelled according to a data model (different schema) specific for that source database , to a target database having its own specific target data model , based upon the corporate data model , and said system comprising an intermediate merge database , including control means , between said source databases and said target database , said merge database having an intermediate data (intermediate data, computer system) structure , and being filled with data from said source database , while said data are re-grouped and re-structured according to said intermediate data structure ;
said target database being filled from said merge database with said re-grouped and re-structured data (second schema, second set) , according to said target data model , said target model being enabled to extract data by said system users via user front-end applications ;
said intermediate data structure of said merge database comprising following structure elements ;
a source identification (BI) indicating the origin of data records within the source database ;
a source reference (BR) indicating the origin of related data items in the source database ;
a target object identifier (TI) indicating a target object for each of said data record ;
a target object reference (TR) indicating a target object for each of said related data items ;
a status identifier (ST) indicating if data is current ;
and grouping attributes (GA) indicating fields into which data records can be divided in the case of a target object , and relational attributes (RA) indicating fields with which data records are related to a target object-identifier within the merge database ;
said transformation , by means of said merge database , comprising following basic operations : a PROJECTION operation , consisting of filling the table of the merge database from the source database and thereby changing said source identification (BI) , said source reference (BR) , said status identifier (ST) , and said attributes (GA , RA) , by said merge database , in conformity to the different data models of the respective databases ;
a GROUPING operation , consisting of determining the target object identifier of each data record and thereby changing said target identifier (TI) and reading said status identifier (ST) and said grouping attribute ;
a RELATING operation , consisting of determining the target object identifier of each related data item and thereby reading said target object identifier (TI) , said status identifier and said relational attribute (RA) , while changing sand target object reference (TR) ;
and optionally a CALCULATING operation , consisting of an evaluation of each attribute of a table of the merge database .

US8190610B2
CLAIM 34
. The map-reduce method of claim 33 , wherein the reducing is accomplished by iterating on at least one key from each of the first and second intermediate data (intermediate data) set so that the output data set (said system) is a merging of a portion of the first and second intermediate data set .
US6678691B1
CLAIM 1
. A system for generating corporate information such as customer information and the like and providing said corporate information to system users , said corporate information originating from various automation systems which contain information on partly the same objects and being displayed as originating from one source thus providing an integrated image , said system (data set, first data set, second data set) comprising means for transforming the contents of a plurality of source data bases , each of them being loaded with legacy data records which are modelled according to a data model specific for that source database , to a target database having its own specific target data model , based upon the corporate data model , and said system comprising an intermediate merge database , including control means , between said source databases and said target database , said merge database having an intermediate data (intermediate data, computer system) structure , and being filled with data from said source database , while said data are re-grouped and re-structured according to said intermediate data structure ;
said target database being filled from said merge database with said re-grouped and re-structured data , according to said target data model , said target model being enabled to extract data by said system users via user front-end applications ;
said intermediate data structure of said merge database comprising following structure elements ;
a source identification (BI) indicating the origin of data records within the source database ;
a source reference (BR) indicating the origin of related data items in the source database ;
a target object identifier (TI) indicating a target object for each of said data record ;
a target object reference (TR) indicating a target object for each of said related data items ;
a status identifier (ST) indicating if data is current ;
and grouping attributes (GA) indicating fields into which data records can be divided in the case of a target object , and relational attributes (RA) indicating fields with which data records are related to a target object-identifier within the merge database ;
said transformation , by means of said merge database , comprising following basic operations : a PROJECTION operation , consisting of filling the table of the merge database from the source database and thereby changing said source identification (BI) , said source reference (BR) , said status identifier (ST) , and said attributes (GA , RA) , by said merge database , in conformity to the different data models of the respective databases ;
a GROUPING operation , consisting of determining the target object identifier of each data record and thereby changing said target identifier (TI) and reading said status identifier (ST) and said grouping attribute ;
a RELATING operation , consisting of determining the target object identifier of each related data item and thereby reading said target object identifier (TI) , said status identifier and said relational attribute (RA) , while changing sand target object reference (TR) ;
and optionally a CALCULATING operation , consisting of an evaluation of each attribute of a table of the merge database .

US8190610B2
CLAIM 35
. The map-reduce method of claim 33 , wherein the reducing is accomplished by a plurality of reducers , the method further comprises : partitioning the first or second intermediate data (intermediate data) set into a plurality of partitions , at least some of the key-value pairs of the first or second intermediate data being provided to more than one of the partitions ;

and providing the partitioned intermediate data set of each partition to a separate one of the reducers .
US6678691B1
CLAIM 1
. A system for generating corporate information such as customer information and the like and providing said corporate information to system users , said corporate information originating from various automation systems which contain information on partly the same objects and being displayed as originating from one source thus providing an integrated image , said system (data set, first data set, second data set) comprising means for transforming the contents of a plurality of source data bases , each of them being loaded with legacy data records which are modelled according to a data model specific for that source database , to a target database having its own specific target data model , based upon the corporate data model , and said system comprising an intermediate merge database , including control means , between said source databases and said target database , said merge database having an intermediate data (intermediate data, computer system) structure , and being filled with data from said source database , while said data are re-grouped and re-structured according to said intermediate data structure ;
said target database being filled from said merge database with said re-grouped and re-structured data , according to said target data model , said target model being enabled to extract data by said system users via user front-end applications ;
said intermediate data structure of said merge database comprising following structure elements ;
a source identification (BI) indicating the origin of data records within the source database ;
a source reference (BR) indicating the origin of related data items in the source database ;
a target object identifier (TI) indicating a target object for each of said data record ;
a target object reference (TR) indicating a target object for each of said related data items ;
a status identifier (ST) indicating if data is current ;
and grouping attributes (GA) indicating fields into which data records can be divided in the case of a target object , and relational attributes (RA) indicating fields with which data records are related to a target object-identifier within the merge database ;
said transformation , by means of said merge database , comprising following basic operations : a PROJECTION operation , consisting of filling the table of the merge database from the source database and thereby changing said source identification (BI) , said source reference (BR) , said status identifier (ST) , and said attributes (GA , RA) , by said merge database , in conformity to the different data models of the respective databases ;
a GROUPING operation , consisting of determining the target object identifier of each data record and thereby changing said target identifier (TI) and reading said status identifier (ST) and said grouping attribute ;
a RELATING operation , consisting of determining the target object identifier of each related data item and thereby reading said target object identifier (TI) , said status identifier and said relational attribute (RA) , while changing sand target object reference (TR) ;
and optionally a CALCULATING operation , consisting of an evaluation of each attribute of a table of the merge database .

US8190610B2
CLAIM 36
. The map-reduce method of claim 35 , wherein all of the key-value pairs of the first and second intermediate data (intermediate data) set are provided to all of the reducers .
US6678691B1
CLAIM 1
. A system for generating corporate information such as customer information and the like and providing said corporate information to system users , said corporate information originating from various automation systems which contain information on partly the same objects and being displayed as originating from one source thus providing an integrated image , said system (data set, first data set, second data set) comprising means for transforming the contents of a plurality of source data bases , each of them being loaded with legacy data records which are modelled according to a data model specific for that source database , to a target database having its own specific target data model , based upon the corporate data model , and said system comprising an intermediate merge database , including control means , between said source databases and said target database , said merge database having an intermediate data (intermediate data, computer system) structure , and being filled with data from said source database , while said data are re-grouped and re-structured according to said intermediate data structure ;
said target database being filled from said merge database with said re-grouped and re-structured data , according to said target data model , said target model being enabled to extract data by said system users via user front-end applications ;
said intermediate data structure of said merge database comprising following structure elements ;
a source identification (BI) indicating the origin of data records within the source database ;
a source reference (BR) indicating the origin of related data items in the source database ;
a target object identifier (TI) indicating a target object for each of said data record ;
a target object reference (TR) indicating a target object for each of said related data items ;
a status identifier (ST) indicating if data is current ;
and grouping attributes (GA) indicating fields into which data records can be divided in the case of a target object , and relational attributes (RA) indicating fields with which data records are related to a target object-identifier within the merge database ;
said transformation , by means of said merge database , comprising following basic operations : a PROJECTION operation , consisting of filling the table of the merge database from the source database and thereby changing said source identification (BI) , said source reference (BR) , said status identifier (ST) , and said attributes (GA , RA) , by said merge database , in conformity to the different data models of the respective databases ;
a GROUPING operation , consisting of determining the target object identifier of each data record and thereby changing said target identifier (TI) and reading said status identifier (ST) and said grouping attribute ;
a RELATING operation , consisting of determining the target object identifier of each related data item and thereby reading said target object identifier (TI) , said status identifier and said relational attribute (RA) , while changing sand target object reference (TR) ;
and optionally a CALCULATING operation , consisting of an evaluation of each attribute of a table of the merge database .

US8190610B2
CLAIM 40
. A computer system (intermediate data) including a plurality of computing devices , the computer system configured to process data of a data set (said system) , wherein the data set comprises a plurality of data groups (system users) , the computer system comprises at least one processor and memory that are operable to perform the following operations : for a first data set having a plurality of first key-value pairs , wherein such first data set belongs to a first data group and the first key-value pairs have a first schema , partitioning the first data set into a plurality of data partitions and providing each data partition to a selected one of a plurality of mapping functions that are each user-configurable to independently output a plurality of lists of values for each of a set of keys found in such map function' ;

s corresponding data partition to form a first intermediate data (intermediate data) set having a first set of resulting key-value pairs ;

for a second data set having a plurality of second key-value pairs , wherein such second data set belongs to a second data group and the second key-value pairs have a second schema (structured data) , partitioning the first data set into a plurality of data partitions and providing each data partition to a selected one of a plurality of mapping functions to form a second intermediate data set having a second set (structured data) of resulting key-value pairs differing from the first set of resulting key-value pairs , wherein the first schema differs from the second schema ;

and reducing the first and the second intermediate data set together so as to form an output data set , wherein the reducing is accomplished by iterating on at least one key from the first and second intermediate data set , and the output data set has a different schema (data model) than the first and second schema , wherein the first and second schema and the first and second set of resulting key-value pairs have a key in common .
US6678691B1
CLAIM 1
. A system for generating corporate information such as customer information and the like and providing said corporate information to system users (data groups) , said corporate information originating from various automation systems which contain information on partly the same objects and being displayed as originating from one source thus providing an integrated image , said system (data set, first data set, second data set) comprising means for transforming the contents of a plurality of source data bases , each of them being loaded with legacy data records which are modelled according to a data model (different schema) specific for that source database , to a target database having its own specific target data model , based upon the corporate data model , and said system comprising an intermediate merge database , including control means , between said source databases and said target database , said merge database having an intermediate data (intermediate data, computer system) structure , and being filled with data from said source database , while said data are re-grouped and re-structured according to said intermediate data structure ;
said target database being filled from said merge database with said re-grouped and re-structured data (second schema, second set) , according to said target data model , said target model being enabled to extract data by said system users via user front-end applications ;
said intermediate data structure of said merge database comprising following structure elements ;
a source identification (BI) indicating the origin of data records within the source database ;
a source reference (BR) indicating the origin of related data items in the source database ;
a target object identifier (TI) indicating a target object for each of said data record ;
a target object reference (TR) indicating a target object for each of said related data items ;
a status identifier (ST) indicating if data is current ;
and grouping attributes (GA) indicating fields into which data records can be divided in the case of a target object , and relational attributes (RA) indicating fields with which data records are related to a target object-identifier within the merge database ;
said transformation , by means of said merge database , comprising following basic operations : a PROJECTION operation , consisting of filling the table of the merge database from the source database and thereby changing said source identification (BI) , said source reference (BR) , said status identifier (ST) , and said attributes (GA , RA) , by said merge database , in conformity to the different data models of the respective databases ;
a GROUPING operation , consisting of determining the target object identifier of each data record and thereby changing said target identifier (TI) and reading said status identifier (ST) and said grouping attribute ;
a RELATING operation , consisting of determining the target object identifier of each related data item and thereby reading said target object identifier (TI) , said status identifier and said relational attribute (RA) , while changing sand target object reference (TR) ;
and optionally a CALCULATING operation , consisting of an evaluation of each attribute of a table of the merge database .

US8190610B2
CLAIM 41
. The computer system (intermediate data) of claim 40 , wherein the reducing is accomplished by iterating on at least one key from each of the first and second intermediate data (intermediate data) set so that the output data set (said system) is a merging of a portion of the first and second intermediate data set .
US6678691B1
CLAIM 1
. A system for generating corporate information such as customer information and the like and providing said corporate information to system users , said corporate information originating from various automation systems which contain information on partly the same objects and being displayed as originating from one source thus providing an integrated image , said system (data set, first data set, second data set) comprising means for transforming the contents of a plurality of source data bases , each of them being loaded with legacy data records which are modelled according to a data model specific for that source database , to a target database having its own specific target data model , based upon the corporate data model , and said system comprising an intermediate merge database , including control means , between said source databases and said target database , said merge database having an intermediate data (intermediate data, computer system) structure , and being filled with data from said source database , while said data are re-grouped and re-structured according to said intermediate data structure ;
said target database being filled from said merge database with said re-grouped and re-structured data , according to said target data model , said target model being enabled to extract data by said system users via user front-end applications ;
said intermediate data structure of said merge database comprising following structure elements ;
a source identification (BI) indicating the origin of data records within the source database ;
a source reference (BR) indicating the origin of related data items in the source database ;
a target object identifier (TI) indicating a target object for each of said data record ;
a target object reference (TR) indicating a target object for each of said related data items ;
a status identifier (ST) indicating if data is current ;
and grouping attributes (GA) indicating fields into which data records can be divided in the case of a target object , and relational attributes (RA) indicating fields with which data records are related to a target object-identifier within the merge database ;
said transformation , by means of said merge database , comprising following basic operations : a PROJECTION operation , consisting of filling the table of the merge database from the source database and thereby changing said source identification (BI) , said source reference (BR) , said status identifier (ST) , and said attributes (GA , RA) , by said merge database , in conformity to the different data models of the respective databases ;
a GROUPING operation , consisting of determining the target object identifier of each data record and thereby changing said target identifier (TI) and reading said status identifier (ST) and said grouping attribute ;
a RELATING operation , consisting of determining the target object identifier of each related data item and thereby reading said target object identifier (TI) , said status identifier and said relational attribute (RA) , while changing sand target object reference (TR) ;
and optionally a CALCULATING operation , consisting of an evaluation of each attribute of a table of the merge database .

US8190610B2
CLAIM 42
. The computer system (intermediate data) of claim 40 , wherein the reducing is accomplished by a plurality of reducers , the d the at least one processor and memory further operable for : partitioning the first or second intermediate data (intermediate data) set into a plurality of partitions , at least some of the key-value pairs of the first or second intermediate data being provided to more than one of the partitions ;

and providing the partitioned intermediate data set of each partition to a separate one of the reducers .
US6678691B1
CLAIM 1
. A system for generating corporate information such as customer information and the like and providing said corporate information to system users , said corporate information originating from various automation systems which contain information on partly the same objects and being displayed as originating from one source thus providing an integrated image , said system (data set, first data set, second data set) comprising means for transforming the contents of a plurality of source data bases , each of them being loaded with legacy data records which are modelled according to a data model specific for that source database , to a target database having its own specific target data model , based upon the corporate data model , and said system comprising an intermediate merge database , including control means , between said source databases and said target database , said merge database having an intermediate data (intermediate data, computer system) structure , and being filled with data from said source database , while said data are re-grouped and re-structured according to said intermediate data structure ;
said target database being filled from said merge database with said re-grouped and re-structured data , according to said target data model , said target model being enabled to extract data by said system users via user front-end applications ;
said intermediate data structure of said merge database comprising following structure elements ;
a source identification (BI) indicating the origin of data records within the source database ;
a source reference (BR) indicating the origin of related data items in the source database ;
a target object identifier (TI) indicating a target object for each of said data record ;
a target object reference (TR) indicating a target object for each of said related data items ;
a status identifier (ST) indicating if data is current ;
and grouping attributes (GA) indicating fields into which data records can be divided in the case of a target object , and relational attributes (RA) indicating fields with which data records are related to a target object-identifier within the merge database ;
said transformation , by means of said merge database , comprising following basic operations : a PROJECTION operation , consisting of filling the table of the merge database from the source database and thereby changing said source identification (BI) , said source reference (BR) , said status identifier (ST) , and said attributes (GA , RA) , by said merge database , in conformity to the different data models of the respective databases ;
a GROUPING operation , consisting of determining the target object identifier of each data record and thereby changing said target identifier (TI) and reading said status identifier (ST) and said grouping attribute ;
a RELATING operation , consisting of determining the target object identifier of each related data item and thereby reading said target object identifier (TI) , said status identifier and said relational attribute (RA) , while changing sand target object reference (TR) ;
and optionally a CALCULATING operation , consisting of an evaluation of each attribute of a table of the merge database .

US8190610B2
CLAIM 43
. The computer system (intermediate data) of claim 42 , wherein all of the key-value pairs of the first and second intermediate data (intermediate data) set are provided to all of the reducers .
US6678691B1
CLAIM 1
. A system for generating corporate information such as customer information and the like and providing said corporate information to system users , said corporate information originating from various automation systems which contain information on partly the same objects and being displayed as originating from one source thus providing an integrated image , said system (data set, first data set, second data set) comprising means for transforming the contents of a plurality of source data bases , each of them being loaded with legacy data records which are modelled according to a data model specific for that source database , to a target database having its own specific target data model , based upon the corporate data model , and said system comprising an intermediate merge database , including control means , between said source databases and said target database , said merge database having an intermediate data (intermediate data, computer system) structure , and being filled with data from said source database , while said data are re-grouped and re-structured according to said intermediate data structure ;
said target database being filled from said merge database with said re-grouped and re-structured data , according to said target data model , said target model being enabled to extract data by said system users via user front-end applications ;
said intermediate data structure of said merge database comprising following structure elements ;
a source identification (BI) indicating the origin of data records within the source database ;
a source reference (BR) indicating the origin of related data items in the source database ;
a target object identifier (TI) indicating a target object for each of said data record ;
a target object reference (TR) indicating a target object for each of said related data items ;
a status identifier (ST) indicating if data is current ;
and grouping attributes (GA) indicating fields into which data records can be divided in the case of a target object , and relational attributes (RA) indicating fields with which data records are related to a target object-identifier within the merge database ;
said transformation , by means of said merge database , comprising following basic operations : a PROJECTION operation , consisting of filling the table of the merge database from the source database and thereby changing said source identification (BI) , said source reference (BR) , said status identifier (ST) , and said attributes (GA , RA) , by said merge database , in conformity to the different data models of the respective databases ;
a GROUPING operation , consisting of determining the target object identifier of each data record and thereby changing said target identifier (TI) and reading said status identifier (ST) and said grouping attribute ;
a RELATING operation , consisting of determining the target object identifier of each related data item and thereby reading said target object identifier (TI) , said status identifier and said relational attribute (RA) , while changing sand target object reference (TR) ;
and optionally a CALCULATING operation , consisting of an evaluation of each attribute of a table of the merge database .

US8190610B2
CLAIM 44
. The computer system (intermediate data) of claim 42 , wherein at least some of the reducers include a sort , group-by-key and combine task , the d the at least one processor and memory are further operable for : generating and providing metadata for at least some of the mapping , partitioning , combining , grouping and sorting .
US6678691B1
CLAIM 1
. A system for generating corporate information such as customer information and the like and providing said corporate information to system users , said corporate information originating from various automation systems which contain information on partly the same objects and being displayed as originating from one source thus providing an integrated image , said system comprising means for transforming the contents of a plurality of source data bases , each of them being loaded with legacy data records which are modelled according to a data model specific for that source database , to a target database having its own specific target data model , based upon the corporate data model , and said system comprising an intermediate merge database , including control means , between said source databases and said target database , said merge database having an intermediate data (intermediate data, computer system) structure , and being filled with data from said source database , while said data are re-grouped and re-structured according to said intermediate data structure ;
said target database being filled from said merge database with said re-grouped and re-structured data , according to said target data model , said target model being enabled to extract data by said system users via user front-end applications ;
said intermediate data structure of said merge database comprising following structure elements ;
a source identification (BI) indicating the origin of data records within the source database ;
a source reference (BR) indicating the origin of related data items in the source database ;
a target object identifier (TI) indicating a target object for each of said data record ;
a target object reference (TR) indicating a target object for each of said related data items ;
a status identifier (ST) indicating if data is current ;
and grouping attributes (GA) indicating fields into which data records can be divided in the case of a target object , and relational attributes (RA) indicating fields with which data records are related to a target object-identifier within the merge database ;
said transformation , by means of said merge database , comprising following basic operations : a PROJECTION operation , consisting of filling the table of the merge database from the source database and thereby changing said source identification (BI) , said source reference (BR) , said status identifier (ST) , and said attributes (GA , RA) , by said merge database , in conformity to the different data models of the respective databases ;
a GROUPING operation , consisting of determining the target object identifier of each data record and thereby changing said target identifier (TI) and reading said status identifier (ST) and said grouping attribute ;
a RELATING operation , consisting of determining the target object identifier of each related data item and thereby reading said target object identifier (TI) , said status identifier and said relational attribute (RA) , while changing sand target object reference (TR) ;
and optionally a CALCULATING operation , consisting of an evaluation of each attribute of a table of the merge database .

US8190610B2
CLAIM 45
. The computer system (intermediate data) of claim 44 , wherein the reducing includes processing the metadata .
US6678691B1
CLAIM 1
. A system for generating corporate information such as customer information and the like and providing said corporate information to system users , said corporate information originating from various automation systems which contain information on partly the same objects and being displayed as originating from one source thus providing an integrated image , said system comprising means for transforming the contents of a plurality of source data bases , each of them being loaded with legacy data records which are modelled according to a data model specific for that source database , to a target database having its own specific target data model , based upon the corporate data model , and said system comprising an intermediate merge database , including control means , between said source databases and said target database , said merge database having an intermediate data (intermediate data, computer system) structure , and being filled with data from said source database , while said data are re-grouped and re-structured according to said intermediate data structure ;
said target database being filled from said merge database with said re-grouped and re-structured data , according to said target data model , said target model being enabled to extract data by said system users via user front-end applications ;
said intermediate data structure of said merge database comprising following structure elements ;
a source identification (BI) indicating the origin of data records within the source database ;
a source reference (BR) indicating the origin of related data items in the source database ;
a target object identifier (TI) indicating a target object for each of said data record ;
a target object reference (TR) indicating a target object for each of said related data items ;
a status identifier (ST) indicating if data is current ;
and grouping attributes (GA) indicating fields into which data records can be divided in the case of a target object , and relational attributes (RA) indicating fields with which data records are related to a target object-identifier within the merge database ;
said transformation , by means of said merge database , comprising following basic operations : a PROJECTION operation , consisting of filling the table of the merge database from the source database and thereby changing said source identification (BI) , said source reference (BR) , said status identifier (ST) , and said attributes (GA , RA) , by said merge database , in conformity to the different data models of the respective databases ;
a GROUPING operation , consisting of determining the target object identifier of each data record and thereby changing said target identifier (TI) and reading said status identifier (ST) and said grouping attribute ;
a RELATING operation , consisting of determining the target object identifier of each related data item and thereby reading said target object identifier (TI) , said status identifier and said relational attribute (RA) , while changing sand target object reference (TR) ;
and optionally a CALCULATING operation , consisting of an evaluation of each attribute of a table of the merge database .

US8190610B2
CLAIM 46
. The computer system (intermediate data) of claim 45 , wherein iterating includes providing the associated metadata to the processing of the reducing step .
US6678691B1
CLAIM 1
. A system for generating corporate information such as customer information and the like and providing said corporate information to system users , said corporate information originating from various automation systems which contain information on partly the same objects and being displayed as originating from one source thus providing an integrated image , said system comprising means for transforming the contents of a plurality of source data bases , each of them being loaded with legacy data records which are modelled according to a data model specific for that source database , to a target database having its own specific target data model , based upon the corporate data model , and said system comprising an intermediate merge database , including control means , between said source databases and said target database , said merge database having an intermediate data (intermediate data, computer system) structure , and being filled with data from said source database , while said data are re-grouped and re-structured according to said intermediate data structure ;
said target database being filled from said merge database with said re-grouped and re-structured data , according to said target data model , said target model being enabled to extract data by said system users via user front-end applications ;
said intermediate data structure of said merge database comprising following structure elements ;
a source identification (BI) indicating the origin of data records within the source database ;
a source reference (BR) indicating the origin of related data items in the source database ;
a target object identifier (TI) indicating a target object for each of said data record ;
a target object reference (TR) indicating a target object for each of said related data items ;
a status identifier (ST) indicating if data is current ;
and grouping attributes (GA) indicating fields into which data records can be divided in the case of a target object , and relational attributes (RA) indicating fields with which data records are related to a target object-identifier within the merge database ;
said transformation , by means of said merge database , comprising following basic operations : a PROJECTION operation , consisting of filling the table of the merge database from the source database and thereby changing said source identification (BI) , said source reference (BR) , said status identifier (ST) , and said attributes (GA , RA) , by said merge database , in conformity to the different data models of the respective databases ;
a GROUPING operation , consisting of determining the target object identifier of each data record and thereby changing said target identifier (TI) and reading said status identifier (ST) and said grouping attribute ;
a RELATING operation , consisting of determining the target object identifier of each related data item and thereby reading said target object identifier (TI) , said status identifier and said relational attribute (RA) , while changing sand target object reference (TR) ;
and optionally a CALCULATING operation , consisting of an evaluation of each attribute of a table of the merge database .




US8190610B2

Filed: 2006-10-05     Issued: 2012-05-29

MapReduce for distributed database processing

(Original Assignee) Yahoo Inc     (Current Assignee) R2 Solutions LLC ; Altaba Inc

Ali Dasdan, Hung-Chih Yang, Ruey-Lung Hsiao
EP1032175A2

Filed: 2000-01-05     Issued: 2000-08-30

System and method for transferring partitioned data sets over multiple threads

(Original Assignee) Sun Microsystems Inc     (Current Assignee) Sun Microsystems Inc

Wojciech Gasior, Aaron Hughes
US8190610B2
CLAIM 1
. A method of processing data of a data set over a distributed system , wherein the data set comprises a plurality of data groups , the method comprising : partitioning the data of each one of the data groups into a plurality of data partitions that each have a plurality of key-value pairs and providing each data partition (data partition) to a selected one of a plurality of mapping functions (corresponding port) that are each user-configurable to independently output a plurality of lists of values for each of a set of keys found in such map function' ;

s corresponding data partition to form corresponding intermediate data for that data group and identifiable to that data group , wherein the data of a first data group has a different schema than the data of a second data group and the data of the first data group is mapped differently than the data of the second data group so that different lists of values are output for the corresponding different intermediate data , wherein the different schema and corresponding different intermediate data have a key in common ;

and reducing the intermediate data for the data groups to at least one output data group , including processing the intermediate data for each data group in a manner that is defined to correspond to that data group , so as to result in a merging of the corresponding different intermediate data based on the key in common , wherein the mapping and reducing operations are performed by a distributed system .
EP1032175A2
CLAIM 1
A distributive computer system for transferring a data set between a first computer storage location and a second computer storage location , the system comprising : a plurality of communication nodes , coupling the first and second computer storage locations , each of the plurality of communication nodes serving as a communication thread ;
a data transfer controller , coupled to the first or second computer storage location and the plurality of communication nodes , to select a number of communication threads to serve as data transfer links between the first and second computer storage location ;
a data partition (data partition) er , coupled to the first computer storage location , wherein the data partitioner is responsive to a data set transfer request to partition the data set into transferable data set partitions and wherein the data transfer controller transfers the data set partitions over the communication threads in parallel .

EP1032175A2
CLAIM 4
The distributive computer system according to claim 3 wherein a first set of the data set partitions equal to the number of selected communication threads are transferred over the communication threads in parallel and any remaining data set partitions are transferred via communication threads that have completed transferring a corresponding port (mapping functions) ion of the first set of data set partitions such that some of the data set partitions are transferred serially over the parallel communication threads .

US8190610B2
CLAIM 8
. The method of claim 7 , wherein : in the partitioning step (partitioning step) , all of the key/value pairs of the intermediate data are provided to all of the partitions .
EP1032175A2
CLAIM 14
The method according to claim 9 wherein the data set is comprised of data rows having a total range and the partitioning step (partitioning step) further comprises : defining a partition column to mark each data set partition ;
selecting a subrange of data rows to define each data set partition ;
assigning the range of data rows to a given data set partition .

US8190610B2
CLAIM 17
. A computer system including a plurality of computing devices , the computer system configured to process data of a data set , wherein the data set comprises a plurality of data groups , the computer system comprises at least one processor and memory that are operable to perform the following operations : partitioning the data of each one of the data groups into a plurality of data partitions that each have a plurality of key-value pairs and providing each data partition (data partition) to a selected one of a plurality of mapping functions (corresponding port) that are each user-configurable to independently output a plurality of lists of values for each of a set of keys found in such map function' ;

s corresponding data partition to form corresponding intermediate data for that data group and identifiable to that data group , wherein the data of a first data group has a different schema than the data of a second data group and the data of the first data group is mapped differently than the data of the second data group so that different lists of values are output for the corresponding different intermediate data , wherein the different schema and corresponding different intermediate data have a key in common ;

and reduce the intermediate data for the data groups to at least one output data group , including processing the intermediate data for each data group in a manner that is defined to correspond to that data group so as to result in a merging of the corresponding different intermediate data based on the key in common .
EP1032175A2
CLAIM 1
A distributive computer system for transferring a data set between a first computer storage location and a second computer storage location , the system comprising : a plurality of communication nodes , coupling the first and second computer storage locations , each of the plurality of communication nodes serving as a communication thread ;
a data transfer controller , coupled to the first or second computer storage location and the plurality of communication nodes , to select a number of communication threads to serve as data transfer links between the first and second computer storage location ;
a data partition (data partition) er , coupled to the first computer storage location , wherein the data partitioner is responsive to a data set transfer request to partition the data set into transferable data set partitions and wherein the data transfer controller transfers the data set partitions over the communication threads in parallel .

EP1032175A2
CLAIM 4
The distributive computer system according to claim 3 wherein a first set of the data set partitions equal to the number of selected communication threads are transferred over the communication threads in parallel and any remaining data set partitions are transferred via communication threads that have completed transferring a corresponding port (mapping functions) ion of the first set of data set partitions such that some of the data set partitions are transferred serially over the parallel communication threads .

US8190610B2
CLAIM 33
. A map-reduce method of processing data from a plurality of groups having different schema over a computer system , the method comprising : for a first data set having a plurality of first key-value pairs , wherein such first data set belongs to a first data group and the first key-value pairs have a first schema , partitioning the first data set into a plurality of data partitions and providing each data partition (data partition) to a selected one of a plurality of mapping functions (corresponding port) that are each user-configurable to independently output a plurality of lists of values for each of a set of keys found in such map function' ;

s corresponding data partition to form a first intermediate data set having a first set (first set) of resulting key-value pairs ;

for a second data set having a plurality of second key-value pairs , wherein such second data set belongs to a second data group and the second key-value pairs have a second schema , partitioning the first data set into a plurality of data partitions and providing each data partition to a selected one of a plurality of mapping functions to form a second intermediate data set having a second set of resulting key-value pairs differing from the first set of resulting key-value pairs , wherein the first schema differs from the second schema , wherein the first and second schema and the first and second set of resulting key-value pairs have a key in common ;

and reducing the first and the second intermediate data set together so as to form an output data set , wherein the reducing is accomplished by iterating on at least one key from the first and second intermediate data set , and the output data set has a different schema than the first and second schema , wherein the mapping and reducing operations are performed by a distributed system .
EP1032175A2
CLAIM 1
A distributive computer system for transferring a data set between a first computer storage location and a second computer storage location , the system comprising : a plurality of communication nodes , coupling the first and second computer storage locations , each of the plurality of communication nodes serving as a communication thread ;
a data transfer controller , coupled to the first or second computer storage location and the plurality of communication nodes , to select a number of communication threads to serve as data transfer links between the first and second computer storage location ;
a data partition (data partition) er , coupled to the first computer storage location , wherein the data partitioner is responsive to a data set transfer request to partition the data set into transferable data set partitions and wherein the data transfer controller transfers the data set partitions over the communication threads in parallel .

EP1032175A2
CLAIM 4
The distributive computer system according to claim 3 wherein a first set (first set) of the data set partitions equal to the number of selected communication threads are transferred over the communication threads in parallel and any remaining data set partitions are transferred via communication threads that have completed transferring a corresponding port (mapping functions) ion of the first set of data set partitions such that some of the data set partitions are transferred serially over the parallel communication threads .

US8190610B2
CLAIM 40
. A computer system including a plurality of computing devices , the computer system configured to process data of a data set , wherein the data set comprises a plurality of data groups , the computer system comprises at least one processor and memory that are operable to perform the following operations : for a first data set having a plurality of first key-value pairs , wherein such first data set belongs to a first data group and the first key-value pairs have a first schema , partitioning the first data set into a plurality of data partitions and providing each data partition (data partition) to a selected one of a plurality of mapping functions (corresponding port) that are each user-configurable to independently output a plurality of lists of values for each of a set of keys found in such map function' ;

s corresponding data partition to form a first intermediate data set having a first set (first set) of resulting key-value pairs ;

for a second data set having a plurality of second key-value pairs , wherein such second data set belongs to a second data group and the second key-value pairs have a second schema , partitioning the first data set into a plurality of data partitions and providing each data partition to a selected one of a plurality of mapping functions to form a second intermediate data set having a second set of resulting key-value pairs differing from the first set of resulting key-value pairs , wherein the first schema differs from the second schema ;

and reducing the first and the second intermediate data set together so as to form an output data set , wherein the reducing is accomplished by iterating on at least one key from the first and second intermediate data set , and the output data set has a different schema than the first and second schema , wherein the first and second schema and the first and second set of resulting key-value pairs have a key in common .
EP1032175A2
CLAIM 1
A distributive computer system for transferring a data set between a first computer storage location and a second computer storage location , the system comprising : a plurality of communication nodes , coupling the first and second computer storage locations , each of the plurality of communication nodes serving as a communication thread ;
a data transfer controller , coupled to the first or second computer storage location and the plurality of communication nodes , to select a number of communication threads to serve as data transfer links between the first and second computer storage location ;
a data partition (data partition) er , coupled to the first computer storage location , wherein the data partitioner is responsive to a data set transfer request to partition the data set into transferable data set partitions and wherein the data transfer controller transfers the data set partitions over the communication threads in parallel .

EP1032175A2
CLAIM 4
The distributive computer system according to claim 3 wherein a first set (first set) of the data set partitions equal to the number of selected communication threads are transferred over the communication threads in parallel and any remaining data set partitions are transferred via communication threads that have completed transferring a corresponding port (mapping functions) ion of the first set of data set partitions such that some of the data set partitions are transferred serially over the parallel communication threads .




US8190610B2

Filed: 2006-10-05     Issued: 2012-05-29

MapReduce for distributed database processing

(Original Assignee) Yahoo Inc     (Current Assignee) R2 Solutions LLC ; Altaba Inc

Ali Dasdan, Hung-Chih Yang, Ruey-Lung Hsiao
CN1245936A

Filed: 1999-07-14     Issued: 2000-03-01

固定格式文字处理方法与装置

(Original Assignee) Panasonic Corp     (Current Assignee) Panasonic Corp

陈惠嫈, 吴玲华
US8190610B2
CLAIM 1
. A method of processing data of a data set over a distributed system , wherein the data set comprises a plurality of data groups , the method comprising : partitioning the data of each one of the data groups into a plurality of data partitions that each have a plurality of key-value pairs and providing each data partition to a selected one of a plurality of mapping functions that are each user-configurable to independently output a plurality of lists of values for each of a set of keys found in such map function' ;

s corresponding data partition (第二个) to form corresponding intermediate data for that data group and identifiable to that data group , wherein the data of a first data (第二个) group has a different schema than the data of a second data group and the data of the first data group is mapped differently than the data of the second data group so that different lists of values are output for the corresponding different intermediate data , wherein the different schema and corresponding different intermediate data have a key in common ;

and reducing the intermediate data for the data groups to at least one output data group , including processing the intermediate data for each data group in a manner that is defined to correspond to that data group , so as to result in a merging of the corresponding different intermediate data based on the key in common , wherein the mapping and reducing operations are performed by a distributed system .
CN1245936A
CLAIM 2
. 一种固定格式文字处理方法,其用于具有一个数据缓冲区、一个输入部分和一个输出部分的计算机系统中,该固定格式文字处理方法包含以下步骤:把数据缓冲区划分为:一个存储不同类型文档夹的文档库,其中每个文档夹中有相同格式的文档数据,一个存储文档关联项目数据的关联数据库,一个存储文档格式的版面库和一个存储文档项目的项目列表菜单;提供一个第一命令输入以选择执行文档编辑管理,创建模板或查找处理;如果选择执行文档编辑管理,通过第二个 (first data, first data group, first data set, s corresponding data partition) 命令输入对文档库中某个指定文档夹执行文档编辑与文档管理中的至少一项,以得到被处理的数据,文档编辑至少包括数据复制、剪切、粘贴、保存和项目数据输入中的一个,文档管理至少包括文档修改、删除、保存和打印中的一个;根据不同的项目类型解析被处理的数据;并将解析处理的数据存储入文档库;在项目列表菜单中建立关联的数据项目管理,以便于在编辑文档时,允许在关联数据的关联数据库中根据输入的主要关键值进行查找,以在该文档中产生关联的数据,并且以便于在关联数据库中记录相互关联数据;如果选择创建模板,至少执行以下子步骤中的一步:从已包含大量文档项目的版面库中选择一种版面进行修改;从项目列表菜单中选择一个预定义项目列表用于重新设计版面;并且定义新的文档项目和设计一个包含这些新文档项目的新版面,这些新文档项目与新版面分别存储在项目列表菜单与版面库中;如果选择执行查找处理,通过输入部分指定一个查找条件;确定查找条件是否是将用在整个文档库中进行查找或是在指定文档夹中进行查找;如果只在指定文档夹中进行查找,则从文档库中选取出指定文档夹放入数据缓冲区用于进行比较;如果在整个文档库中进行查找,则将文档库中的内容与查找条件进行比较,并选取出文档库中满足查找条件的全部文档;将文档库中一个文档夹里相应的文档数据、版面和项目列表组合以形成一个指定的完整的文档,并通过输出部分输出这个完整的文档。

US8190610B2
CLAIM 2
. The method of claim 1 , wherein : the at least one output data group is a plurality of output data groups (一个输入) .
CN1245936A
CLAIM 1
. 一种固定格式文字处理方法,其用于具有一个数据缓冲区、一个输入 (output data groups, output data set) 部分和一个输出部分的计算机系统中,该固定格式文字处理方法包含以下步骤:把数据缓冲区划分为:一个存储不同类型文档夹的文档库,其中每个文档夹中有相同格式的文档数据,一个存储文档格式的版面库和一个存储文档项目的项目列表菜单;提供一个命令输入以选择执行创建模板或是查找处理;如果选择创建模板,至少执行以下子步骤中的一步:从已包含多个文档项目的版面库中选择一种版面进行修改;从项目列表菜单中选择一个预定义项目列表用于重新设计版面;定义新的文档项目并设计一个包含文档项目的新版面,该新文档项目与新版面分别存储在项目列表菜单与版面库中;如果选择执行查找处理,通过输入部分指定一个查找条件;确定查找条件是否是在整个文档库中进行查找或是在指定文档夹中进行查找;如果只在指定文档夹中进行查找,则从文档库中选取出指定的文档夹放入数据缓冲区用于进行比较;如果在整个文档库中进行查找,则将文档库中的内容与查找条件进行比较,并选择地选取出文档库中满足查找条件的全部文档;将文档库中一个文档夹里相应的文档数据、版面和项目列表组合以形成一个指定的完整的文档,并通过输出部分输出这个完整的文档。

US8190610B2
CLAIM 9
. The method of claim 7 , wherein : at least some of the reducers include a sort , group-by-key and combine task (产生关联) ;

the method further comprises generating and providing metadata for at least some of the mapping , partitioning , combining , grouping and sorting .
CN1245936A
CLAIM 2
. 一种固定格式文字处理方法,其用于具有一个数据缓冲区、一个输入部分和一个输出部分的计算机系统中,该固定格式文字处理方法包含以下步骤:把数据缓冲区划分为:一个存储不同类型文档夹的文档库,其中每个文档夹中有相同格式的文档数据,一个存储文档关联项目数据的关联数据库,一个存储文档格式的版面库和一个存储文档项目的项目列表菜单;提供一个第一命令输入以选择执行文档编辑管理,创建模板或查找处理;如果选择执行文档编辑管理,通过第二个命令输入对文档库中某个指定文档夹执行文档编辑与文档管理中的至少一项,以得到被处理的数据,文档编辑至少包括数据复制、剪切、粘贴、保存和项目数据输入中的一个,文档管理至少包括文档修改、删除、保存和打印中的一个;根据不同的项目类型解析被处理的数据;并将解析处理的数据存储入文档库;在项目列表菜单中建立关联的数据项目管理,以便于在编辑文档时,允许在关联数据的关联数据库中根据输入的主要关键值进行查找,以在该文档中产生关联 (combine task) 的数据,并且以便于在关联数据库中记录相互关联数据;如果选择创建模板,至少执行以下子步骤中的一步:从已包含大量文档项目的版面库中选择一种版面进行修改;从项目列表菜单中选择一个预定义项目列表用于重新设计版面;并且定义新的文档项目和设计一个包含这些新文档项目的新版面,这些新文档项目与新版面分别存储在项目列表菜单与版面库中;如果选择执行查找处理,通过输入部分指定一个查找条件;确定查找条件是否是将用在整个文档库中进行查找或是在指定文档夹中进行查找;如果只在指定文档夹中进行查找,则从文档库中选取出指定文档夹放入数据缓冲区用于进行比较;如果在整个文档库中进行查找,则将文档库中的内容与查找条件进行比较,并选取出文档库中满足查找条件的全部文档;将文档库中一个文档夹里相应的文档数据、版面和项目列表组合以形成一个指定的完整的文档,并通过输出部分输出这个完整的文档。

US8190610B2
CLAIM 12
. The method of claim 5 , wherein : the intermediate data includes a plurality of grouped sets of key/value pairs ;

the reducing step is carried out by a plurality of reducers ;

for at least one of the reducers , the iterator corresponding to a particular data group , for that reducer , operates according to a different key (输入命令) of a different schema than the iterator corresponding to another particular data group , for that reducer .
CN1245936A
CLAIM 4
. 一种固定格式文字处理装置,包括一个数据缓冲区,一个输入部分和一个输出部分,其中的改进包含:所述的数据缓冲区被划分为:一个存储不同文档夹数据的文档库,一个存储关联项目数据的关联数据库,一个存储文档版面的版面库和一个存储项目列表的项目列表菜单;控制装置,其与所述输入部分和所述输出部分连接,用于根据从所述输入部分接收的输入命令 (different key) 在文档编辑管理、创建模板和进行查找处理之中进行选择;文档编辑管理装置与所述控制装置和所述数据缓冲区连接,用于对所述文档库中指定的文档夹中的一已有文档或一新文档至少进行文档编辑和文档管理中的一项,以获取处理的数据,文档编辑至少包括数据复制、剪切、粘贴、保存和字符输入中的一项,文档管理至少包括文档修改、删除、保存和打印中的一项,所述文档编辑管理装置按文档数据的不同项目类型解析被处理的数据并将解析的处理数据存入所述文档库;关联管理装置,其与所述文档编辑管理装置和所述数据缓冲区连接,用于在所述项目列表菜单中建立关联数据项目管理,以便允许在与输入项目数据对应的关联数据的所述关联数据库中进行查找,以便在文档中产生关联数据,并在所述关联数据库中记录相关关联数据;模板编辑管理装置,其与所述控制装置和所述数据缓冲区连接,用于从包含多个文档项目的项目列表中和所述项目列表菜单与所述版面库的已有版面中进行选择用于修改,并定义新的文档项目和设计一个包含这些文档项目的新版面,所述模板编辑管理装置将新文档项目与新版面分别存储在所述项目列表菜单与所述版面库中;数据查找装置,其与所述控制装置和所述数据缓冲区连接,用于接收来自控制装置通过输入部分指定的一个查找条件,将从所述文档库中选取出的至少一个指定文档夹与该查找条件进行比较,选取出所述文档库中在所述的至少一个指定文档夹里满足查找条件的全部文档;所述的控制装置包含将所述文档库中一个文档夹里相应的文档数据、版面和项目列表组合以形成一个指定的完整的文档,并通过所述输出部分输出这个完整的文档的装置。

US8190610B2
CLAIM 17
. A computer system including a plurality of computing devices , the computer system configured to process data of a data set , wherein the data set comprises a plurality of data groups , the computer system comprises at least one processor and memory that are operable to perform the following operations : partitioning the data of each one of the data groups into a plurality of data partitions that each have a plurality of key-value pairs and providing each data partition to a selected one of a plurality of mapping functions that are each user-configurable to independently output a plurality of lists of values for each of a set of keys found in such map function' ;

s corresponding data partition (第二个) to form corresponding intermediate data for that data group and identifiable to that data group , wherein the data of a first data (第二个) group has a different schema than the data of a second data group and the data of the first data group is mapped differently than the data of the second data group so that different lists of values are output for the corresponding different intermediate data , wherein the different schema and corresponding different intermediate data have a key in common ;

and reduce the intermediate data for the data groups to at least one output data group , including processing the intermediate data for each data group in a manner that is defined to correspond to that data group so as to result in a merging of the corresponding different intermediate data based on the key in common .
CN1245936A
CLAIM 2
. 一种固定格式文字处理方法,其用于具有一个数据缓冲区、一个输入部分和一个输出部分的计算机系统中,该固定格式文字处理方法包含以下步骤:把数据缓冲区划分为:一个存储不同类型文档夹的文档库,其中每个文档夹中有相同格式的文档数据,一个存储文档关联项目数据的关联数据库,一个存储文档格式的版面库和一个存储文档项目的项目列表菜单;提供一个第一命令输入以选择执行文档编辑管理,创建模板或查找处理;如果选择执行文档编辑管理,通过第二个 (first data, first data group, first data set, s corresponding data partition) 命令输入对文档库中某个指定文档夹执行文档编辑与文档管理中的至少一项,以得到被处理的数据,文档编辑至少包括数据复制、剪切、粘贴、保存和项目数据输入中的一个,文档管理至少包括文档修改、删除、保存和打印中的一个;根据不同的项目类型解析被处理的数据;并将解析处理的数据存储入文档库;在项目列表菜单中建立关联的数据项目管理,以便于在编辑文档时,允许在关联数据的关联数据库中根据输入的主要关键值进行查找,以在该文档中产生关联的数据,并且以便于在关联数据库中记录相互关联数据;如果选择创建模板,至少执行以下子步骤中的一步:从已包含大量文档项目的版面库中选择一种版面进行修改;从项目列表菜单中选择一个预定义项目列表用于重新设计版面;并且定义新的文档项目和设计一个包含这些新文档项目的新版面,这些新文档项目与新版面分别存储在项目列表菜单与版面库中;如果选择执行查找处理,通过输入部分指定一个查找条件;确定查找条件是否是将用在整个文档库中进行查找或是在指定文档夹中进行查找;如果只在指定文档夹中进行查找,则从文档库中选取出指定文档夹放入数据缓冲区用于进行比较;如果在整个文档库中进行查找,则将文档库中的内容与查找条件进行比较,并选取出文档库中满足查找条件的全部文档;将文档库中一个文档夹里相应的文档数据、版面和项目列表组合以形成一个指定的完整的文档,并通过输出部分输出这个完整的文档。

US8190610B2
CLAIM 18
. The computer system of claim 17 , wherein : the at least one output data group is a plurality of output data groups (一个输入) .
CN1245936A
CLAIM 1
. 一种固定格式文字处理方法,其用于具有一个数据缓冲区、一个输入 (output data groups, output data set) 部分和一个输出部分的计算机系统中,该固定格式文字处理方法包含以下步骤:把数据缓冲区划分为:一个存储不同类型文档夹的文档库,其中每个文档夹中有相同格式的文档数据,一个存储文档格式的版面库和一个存储文档项目的项目列表菜单;提供一个命令输入以选择执行创建模板或是查找处理;如果选择创建模板,至少执行以下子步骤中的一步:从已包含多个文档项目的版面库中选择一种版面进行修改;从项目列表菜单中选择一个预定义项目列表用于重新设计版面;定义新的文档项目并设计一个包含文档项目的新版面,该新文档项目与新版面分别存储在项目列表菜单与版面库中;如果选择执行查找处理,通过输入部分指定一个查找条件;确定查找条件是否是在整个文档库中进行查找或是在指定文档夹中进行查找;如果只在指定文档夹中进行查找,则从文档库中选取出指定的文档夹放入数据缓冲区用于进行比较;如果在整个文档库中进行查找,则将文档库中的内容与查找条件进行比较,并选择地选取出文档库中满足查找条件的全部文档;将文档库中一个文档夹里相应的文档数据、版面和项目列表组合以形成一个指定的完整的文档,并通过输出部分输出这个完整的文档。

US8190610B2
CLAIM 22
. The computer system of claim 21 , wherein : the intermediate data includes a plurality of grouped sets of key/value pairs ;

the at least one processor and memory are further operable to reduce the intermediate data via a plurality of reducers ;

for at least one of the reducers , the iterator corresponding to a particular data group , for that reducer , is configured to operate according to a different key (输入命令) of a different schema than the iterator corresponding to another particular data group , for that reducer .
CN1245936A
CLAIM 4
. 一种固定格式文字处理装置,包括一个数据缓冲区,一个输入部分和一个输出部分,其中的改进包含:所述的数据缓冲区被划分为:一个存储不同文档夹数据的文档库,一个存储关联项目数据的关联数据库,一个存储文档版面的版面库和一个存储项目列表的项目列表菜单;控制装置,其与所述输入部分和所述输出部分连接,用于根据从所述输入部分接收的输入命令 (different key) 在文档编辑管理、创建模板和进行查找处理之中进行选择;文档编辑管理装置与所述控制装置和所述数据缓冲区连接,用于对所述文档库中指定的文档夹中的一已有文档或一新文档至少进行文档编辑和文档管理中的一项,以获取处理的数据,文档编辑至少包括数据复制、剪切、粘贴、保存和字符输入中的一项,文档管理至少包括文档修改、删除、保存和打印中的一项,所述文档编辑管理装置按文档数据的不同项目类型解析被处理的数据并将解析的处理数据存入所述文档库;关联管理装置,其与所述文档编辑管理装置和所述数据缓冲区连接,用于在所述项目列表菜单中建立关联数据项目管理,以便允许在与输入项目数据对应的关联数据的所述关联数据库中进行查找,以便在文档中产生关联数据,并在所述关联数据库中记录相关关联数据;模板编辑管理装置,其与所述控制装置和所述数据缓冲区连接,用于从包含多个文档项目的项目列表中和所述项目列表菜单与所述版面库的已有版面中进行选择用于修改,并定义新的文档项目和设计一个包含这些文档项目的新版面,所述模板编辑管理装置将新文档项目与新版面分别存储在所述项目列表菜单与所述版面库中;数据查找装置,其与所述控制装置和所述数据缓冲区连接,用于接收来自控制装置通过输入部分指定的一个查找条件,将从所述文档库中选取出的至少一个指定文档夹与该查找条件进行比较,选取出所述文档库中在所述的至少一个指定文档夹里满足查找条件的全部文档;所述的控制装置包含将所述文档库中一个文档夹里相应的文档数据、版面和项目列表组合以形成一个指定的完整的文档,并通过所述输出部分输出这个完整的文档的装置。

US8190610B2
CLAIM 26
. The computer system of claim 24 , wherein : at least some of the reducers include a sort , group-by-key and combine task (产生关联) ;

the at least one processor and memory are further operable to generate and provide metadata for at least some of the mapping , partitioning , combining , grouping and sorting .
CN1245936A
CLAIM 2
. 一种固定格式文字处理方法,其用于具有一个数据缓冲区、一个输入部分和一个输出部分的计算机系统中,该固定格式文字处理方法包含以下步骤:把数据缓冲区划分为:一个存储不同类型文档夹的文档库,其中每个文档夹中有相同格式的文档数据,一个存储文档关联项目数据的关联数据库,一个存储文档格式的版面库和一个存储文档项目的项目列表菜单;提供一个第一命令输入以选择执行文档编辑管理,创建模板或查找处理;如果选择执行文档编辑管理,通过第二个命令输入对文档库中某个指定文档夹执行文档编辑与文档管理中的至少一项,以得到被处理的数据,文档编辑至少包括数据复制、剪切、粘贴、保存和项目数据输入中的一个,文档管理至少包括文档修改、删除、保存和打印中的一个;根据不同的项目类型解析被处理的数据;并将解析处理的数据存储入文档库;在项目列表菜单中建立关联的数据项目管理,以便于在编辑文档时,允许在关联数据的关联数据库中根据输入的主要关键值进行查找,以在该文档中产生关联 (combine task) 的数据,并且以便于在关联数据库中记录相互关联数据;如果选择创建模板,至少执行以下子步骤中的一步:从已包含大量文档项目的版面库中选择一种版面进行修改;从项目列表菜单中选择一个预定义项目列表用于重新设计版面;并且定义新的文档项目和设计一个包含这些新文档项目的新版面,这些新文档项目与新版面分别存储在项目列表菜单与版面库中;如果选择执行查找处理,通过输入部分指定一个查找条件;确定查找条件是否是将用在整个文档库中进行查找或是在指定文档夹中进行查找;如果只在指定文档夹中进行查找,则从文档库中选取出指定文档夹放入数据缓冲区用于进行比较;如果在整个文档库中进行查找,则将文档库中的内容与查找条件进行比较,并选取出文档库中满足查找条件的全部文档;将文档库中一个文档夹里相应的文档数据、版面和项目列表组合以形成一个指定的完整的文档,并通过输出部分输出这个完整的文档。

US8190610B2
CLAIM 33
. A map-reduce method of processing data from a plurality of groups having different schema over a computer system , the method comprising : for a first data (第二个) set having a plurality of first key-value pairs , wherein such first data set belongs to a first data group and the first key-value pairs have a first schema , partitioning the first data set into a plurality of data partitions and providing each data partition to a selected one of a plurality of mapping functions that are each user-configurable to independently output a plurality of lists of values for each of a set of keys found in such map function' ;

s corresponding data partition (第二个) to form a first intermediate data set (指定的一个) (指定的一个) having a first set of resulting key-value pairs ;

for a second data set having a plurality of second key-value pairs , wherein such second data set belongs to a second data group and the second key-value pairs have a second schema , partitioning the first data set into a plurality of data partitions and providing each data partition to a selected one of a plurality of mapping functions to form a second intermediate data set having a second set (接收一) of resulting key-value pairs differing from the first set of resulting key-value pairs , wherein the first schema differs from the second schema , wherein the first and second schema and the first and second set of resulting key-value pairs have a key in common ;

and reducing the first and the second intermediate data set together so as to form an output data set (一个输入) , wherein the reducing is accomplished by iterating on at least one key from the first and second intermediate data set , and the output data set has a different schema than the first and second schema , wherein the mapping and reducing operations are performed by a distributed system .
CN1245936A
CLAIM 1
. 一种固定格式文字处理方法,其用于具有一个数据缓冲区、一个输入 (output data groups, output data set) 部分和一个输出部分的计算机系统中,该固定格式文字处理方法包含以下步骤:把数据缓冲区划分为:一个存储不同类型文档夹的文档库,其中每个文档夹中有相同格式的文档数据,一个存储文档格式的版面库和一个存储文档项目的项目列表菜单;提供一个命令输入以选择执行创建模板或是查找处理;如果选择创建模板,至少执行以下子步骤中的一步:从已包含多个文档项目的版面库中选择一种版面进行修改;从项目列表菜单中选择一个预定义项目列表用于重新设计版面;定义新的文档项目并设计一个包含文档项目的新版面,该新文档项目与新版面分别存储在项目列表菜单与版面库中;如果选择执行查找处理,通过输入部分指定一个查找条件;确定查找条件是否是在整个文档库中进行查找或是在指定文档夹中进行查找;如果只在指定文档夹中进行查找,则从文档库中选取出指定的文档夹放入数据缓冲区用于进行比较;如果在整个文档库中进行查找,则将文档库中的内容与查找条件进行比较,并选择地选取出文档库中满足查找条件的全部文档;将文档库中一个文档夹里相应的文档数据、版面和项目列表组合以形成一个指定的完整的文档,并通过输出部分输出这个完整的文档。

CN1245936A
CLAIM 2
. 一种固定格式文字处理方法,其用于具有一个数据缓冲区、一个输入部分和一个输出部分的计算机系统中,该固定格式文字处理方法包含以下步骤:把数据缓冲区划分为:一个存储不同类型文档夹的文档库,其中每个文档夹中有相同格式的文档数据,一个存储文档关联项目数据的关联数据库,一个存储文档格式的版面库和一个存储文档项目的项目列表菜单;提供一个第一命令输入以选择执行文档编辑管理,创建模板或查找处理;如果选择执行文档编辑管理,通过第二个 (first data, first data group, first data set, s corresponding data partition) 命令输入对文档库中某个指定文档夹执行文档编辑与文档管理中的至少一项,以得到被处理的数据,文档编辑至少包括数据复制、剪切、粘贴、保存和项目数据输入中的一个,文档管理至少包括文档修改、删除、保存和打印中的一个;根据不同的项目类型解析被处理的数据;并将解析处理的数据存储入文档库;在项目列表菜单中建立关联的数据项目管理,以便于在编辑文档时,允许在关联数据的关联数据库中根据输入的主要关键值进行查找,以在该文档中产生关联的数据,并且以便于在关联数据库中记录相互关联数据;如果选择创建模板,至少执行以下子步骤中的一步:从已包含大量文档项目的版面库中选择一种版面进行修改;从项目列表菜单中选择一个预定义项目列表用于重新设计版面;并且定义新的文档项目和设计一个包含这些新文档项目的新版面,这些新文档项目与新版面分别存储在项目列表菜单与版面库中;如果选择执行查找处理,通过输入部分指定一个查找条件;确定查找条件是否是将用在整个文档库中进行查找或是在指定文档夹中进行查找;如果只在指定文档夹中进行查找,则从文档库中选取出指定文档夹放入数据缓冲区用于进行比较;如果在整个文档库中进行查找,则将文档库中的内容与查找条件进行比较,并选取出文档库中满足查找条件的全部文档;将文档库中一个文档夹里相应的文档数据、版面和项目列表组合以形成一个指定的完整的文档,并通过输出部分输出这个完整的文档。

CN1245936A
CLAIM 3
. 一种固定格式文字处理装置,包括一个数据缓冲区,一个输入部分和一个输出部分,其中的改进包含:所说的数据缓冲区被划分为:一个存储不同文档夹数据的文档库,一个存储文档版面的版面库和一个存储项目列表的项目列表菜单;控制装置,其与所述的输入部分和输出部分连接,用于根据从所述输入部分接收的输入的命令在创建模板或进行查找处理之间选择;模板编辑管理装置,其与所述的控制装置和数据缓冲区连接,用于至少从包含多个文档项目的项目列表中及从所述项目列表菜单和所述版面库中已有版面中选择一个进行修改,并定义新的文档项目,并设计一个包含这些文档项目的新版面,新文档项目与新版面分别被存储在所述项目列表菜单和所述版面库中;数据查找装置,其与所述控制装置和所述数据缓冲区连接,用于从所述控制装置通过所述输入部分接收一 (second set) 个指定的查找条件,从所述文档库中至少选取出一个指定文档夹与该查找条件进行比较,选取出从所述文档库中选择的至少一个指定文档夹里满足查找条件的全部文档;所述控制装置包含将所述文档库中一个文档夹里相应的文档数据、版面和项目列表组合以形成一个指定的完整的文档,并通过所述输出部分输出这个完整的文档的装置。

CN1245936A
CLAIM 4
. 一种固定格式文字处理装置,包括一个数据缓冲区,一个输入部分和一个输出部分,其中的改进包含:所述的数据缓冲区被划分为:一个存储不同文档夹数据的文档库,一个存储关联项目数据的关联数据库,一个存储文档版面的版面库和一个存储项目列表的项目列表菜单;控制装置,其与所述输入部分和所述输出部分连接,用于根据从所述输入部分接收的输入命令在文档编辑管理、创建模板和进行查找处理之中进行选择;文档编辑管理装置与所述控制装置和所述数据缓冲区连接,用于对所述文档库中指定的文档夹中的一已有文档或一新文档至少进行文档编辑和文档管理中的一项,以获取处理的数据,文档编辑至少包括数据复制、剪切、粘贴、保存和字符输入中的一项,文档管理至少包括文档修改、删除、保存和打印中的一项,所述文档编辑管理装置按文档数据的不同项目类型解析被处理的数据并将解析的处理数据存入所述文档库;关联管理装置,其与所述文档编辑管理装置和所述数据缓冲区连接,用于在所述项目列表菜单中建立关联数据项目管理,以便允许在与输入项目数据对应的关联数据的所述关联数据库中进行查找,以便在文档中产生关联数据,并在所述关联数据库中记录相关关联数据;模板编辑管理装置,其与所述控制装置和所述数据缓冲区连接,用于从包含多个文档项目的项目列表中和所述项目列表菜单与所述版面库的已有版面中进行选择用于修改,并定义新的文档项目和设计一个包含这些文档项目的新版面,所述模板编辑管理装置将新文档项目与新版面分别存储在所述项目列表菜单与所述版面库中;数据查找装置,其与所述控制装置和所述数据缓冲区连接,用于接收来自控制装置通过输入部分指定的一个 (first intermediate data set, intermediate data set) 查找条件,将从所述文档库中选取出的至少一个指定文档夹与该查找条件进行比较,选取出所述文档库中在所述的至少一个指定文档夹里满足查找条件的全部文档;所述的控制装置包含将所述文档库中一个文档夹里相应的文档数据、版面和项目列表组合以形成一个指定的完整的文档,并通过所述输出部分输出这个完整的文档的装置。

US8190610B2
CLAIM 34
. The map-reduce method of claim 33 , wherein the reducing is accomplished by iterating on at least one key from each of the first and second intermediate data set (指定的一个) so that the output data set (一个输入) is a merging of a portion of the first and second intermediate data set .
CN1245936A
CLAIM 1
. 一种固定格式文字处理方法,其用于具有一个数据缓冲区、一个输入 (output data groups, output data set) 部分和一个输出部分的计算机系统中,该固定格式文字处理方法包含以下步骤:把数据缓冲区划分为:一个存储不同类型文档夹的文档库,其中每个文档夹中有相同格式的文档数据,一个存储文档格式的版面库和一个存储文档项目的项目列表菜单;提供一个命令输入以选择执行创建模板或是查找处理;如果选择创建模板,至少执行以下子步骤中的一步:从已包含多个文档项目的版面库中选择一种版面进行修改;从项目列表菜单中选择一个预定义项目列表用于重新设计版面;定义新的文档项目并设计一个包含文档项目的新版面,该新文档项目与新版面分别存储在项目列表菜单与版面库中;如果选择执行查找处理,通过输入部分指定一个查找条件;确定查找条件是否是在整个文档库中进行查找或是在指定文档夹中进行查找;如果只在指定文档夹中进行查找,则从文档库中选取出指定的文档夹放入数据缓冲区用于进行比较;如果在整个文档库中进行查找,则将文档库中的内容与查找条件进行比较,并选择地选取出文档库中满足查找条件的全部文档;将文档库中一个文档夹里相应的文档数据、版面和项目列表组合以形成一个指定的完整的文档,并通过输出部分输出这个完整的文档。

CN1245936A
CLAIM 4
. 一种固定格式文字处理装置,包括一个数据缓冲区,一个输入部分和一个输出部分,其中的改进包含:所述的数据缓冲区被划分为:一个存储不同文档夹数据的文档库,一个存储关联项目数据的关联数据库,一个存储文档版面的版面库和一个存储项目列表的项目列表菜单;控制装置,其与所述输入部分和所述输出部分连接,用于根据从所述输入部分接收的输入命令在文档编辑管理、创建模板和进行查找处理之中进行选择;文档编辑管理装置与所述控制装置和所述数据缓冲区连接,用于对所述文档库中指定的文档夹中的一已有文档或一新文档至少进行文档编辑和文档管理中的一项,以获取处理的数据,文档编辑至少包括数据复制、剪切、粘贴、保存和字符输入中的一项,文档管理至少包括文档修改、删除、保存和打印中的一项,所述文档编辑管理装置按文档数据的不同项目类型解析被处理的数据并将解析的处理数据存入所述文档库;关联管理装置,其与所述文档编辑管理装置和所述数据缓冲区连接,用于在所述项目列表菜单中建立关联数据项目管理,以便允许在与输入项目数据对应的关联数据的所述关联数据库中进行查找,以便在文档中产生关联数据,并在所述关联数据库中记录相关关联数据;模板编辑管理装置,其与所述控制装置和所述数据缓冲区连接,用于从包含多个文档项目的项目列表中和所述项目列表菜单与所述版面库的已有版面中进行选择用于修改,并定义新的文档项目和设计一个包含这些文档项目的新版面,所述模板编辑管理装置将新文档项目与新版面分别存储在所述项目列表菜单与所述版面库中;数据查找装置,其与所述控制装置和所述数据缓冲区连接,用于接收来自控制装置通过输入部分指定的一个 (first intermediate data set, intermediate data set) 查找条件,将从所述文档库中选取出的至少一个指定文档夹与该查找条件进行比较,选取出所述文档库中在所述的至少一个指定文档夹里满足查找条件的全部文档;所述的控制装置包含将所述文档库中一个文档夹里相应的文档数据、版面和项目列表组合以形成一个指定的完整的文档,并通过所述输出部分输出这个完整的文档的装置。

US8190610B2
CLAIM 35
. The map-reduce method of claim 33 , wherein the reducing is accomplished by a plurality of reducers , the method further comprises : partitioning the first or second intermediate data set (指定的一个) into a plurality of partitions , at least some of the key-value pairs of the first or second intermediate data being provided to more than one of the partitions ;

and providing the partitioned intermediate data set of each partition to a separate one of the reducers .
CN1245936A
CLAIM 4
. 一种固定格式文字处理装置,包括一个数据缓冲区,一个输入部分和一个输出部分,其中的改进包含:所述的数据缓冲区被划分为:一个存储不同文档夹数据的文档库,一个存储关联项目数据的关联数据库,一个存储文档版面的版面库和一个存储项目列表的项目列表菜单;控制装置,其与所述输入部分和所述输出部分连接,用于根据从所述输入部分接收的输入命令在文档编辑管理、创建模板和进行查找处理之中进行选择;文档编辑管理装置与所述控制装置和所述数据缓冲区连接,用于对所述文档库中指定的文档夹中的一已有文档或一新文档至少进行文档编辑和文档管理中的一项,以获取处理的数据,文档编辑至少包括数据复制、剪切、粘贴、保存和字符输入中的一项,文档管理至少包括文档修改、删除、保存和打印中的一项,所述文档编辑管理装置按文档数据的不同项目类型解析被处理的数据并将解析的处理数据存入所述文档库;关联管理装置,其与所述文档编辑管理装置和所述数据缓冲区连接,用于在所述项目列表菜单中建立关联数据项目管理,以便允许在与输入项目数据对应的关联数据的所述关联数据库中进行查找,以便在文档中产生关联数据,并在所述关联数据库中记录相关关联数据;模板编辑管理装置,其与所述控制装置和所述数据缓冲区连接,用于从包含多个文档项目的项目列表中和所述项目列表菜单与所述版面库的已有版面中进行选择用于修改,并定义新的文档项目和设计一个包含这些文档项目的新版面,所述模板编辑管理装置将新文档项目与新版面分别存储在所述项目列表菜单与所述版面库中;数据查找装置,其与所述控制装置和所述数据缓冲区连接,用于接收来自控制装置通过输入部分指定的一个 (first intermediate data set, intermediate data set) 查找条件,将从所述文档库中选取出的至少一个指定文档夹与该查找条件进行比较,选取出所述文档库中在所述的至少一个指定文档夹里满足查找条件的全部文档;所述的控制装置包含将所述文档库中一个文档夹里相应的文档数据、版面和项目列表组合以形成一个指定的完整的文档,并通过所述输出部分输出这个完整的文档的装置。

US8190610B2
CLAIM 36
. The map-reduce method of claim 35 , wherein all of the key-value pairs of the first and second intermediate data set (指定的一个) are provided to all of the reducers .
CN1245936A
CLAIM 4
. 一种固定格式文字处理装置,包括一个数据缓冲区,一个输入部分和一个输出部分,其中的改进包含:所述的数据缓冲区被划分为:一个存储不同文档夹数据的文档库,一个存储关联项目数据的关联数据库,一个存储文档版面的版面库和一个存储项目列表的项目列表菜单;控制装置,其与所述输入部分和所述输出部分连接,用于根据从所述输入部分接收的输入命令在文档编辑管理、创建模板和进行查找处理之中进行选择;文档编辑管理装置与所述控制装置和所述数据缓冲区连接,用于对所述文档库中指定的文档夹中的一已有文档或一新文档至少进行文档编辑和文档管理中的一项,以获取处理的数据,文档编辑至少包括数据复制、剪切、粘贴、保存和字符输入中的一项,文档管理至少包括文档修改、删除、保存和打印中的一项,所述文档编辑管理装置按文档数据的不同项目类型解析被处理的数据并将解析的处理数据存入所述文档库;关联管理装置,其与所述文档编辑管理装置和所述数据缓冲区连接,用于在所述项目列表菜单中建立关联数据项目管理,以便允许在与输入项目数据对应的关联数据的所述关联数据库中进行查找,以便在文档中产生关联数据,并在所述关联数据库中记录相关关联数据;模板编辑管理装置,其与所述控制装置和所述数据缓冲区连接,用于从包含多个文档项目的项目列表中和所述项目列表菜单与所述版面库的已有版面中进行选择用于修改,并定义新的文档项目和设计一个包含这些文档项目的新版面,所述模板编辑管理装置将新文档项目与新版面分别存储在所述项目列表菜单与所述版面库中;数据查找装置,其与所述控制装置和所述数据缓冲区连接,用于接收来自控制装置通过输入部分指定的一个 (first intermediate data set, intermediate data set) 查找条件,将从所述文档库中选取出的至少一个指定文档夹与该查找条件进行比较,选取出所述文档库中在所述的至少一个指定文档夹里满足查找条件的全部文档;所述的控制装置包含将所述文档库中一个文档夹里相应的文档数据、版面和项目列表组合以形成一个指定的完整的文档,并通过所述输出部分输出这个完整的文档的装置。

US8190610B2
CLAIM 37
. The map-reduce method of claim 35 , wherein at least some of the reducers include a sort , group-by-key and combine task (产生关联) , the method further comprises generating and providing metadata for at least some of the mapping , partitioning , combining , grouping and sorting .
CN1245936A
CLAIM 2
. 一种固定格式文字处理方法,其用于具有一个数据缓冲区、一个输入部分和一个输出部分的计算机系统中,该固定格式文字处理方法包含以下步骤:把数据缓冲区划分为:一个存储不同类型文档夹的文档库,其中每个文档夹中有相同格式的文档数据,一个存储文档关联项目数据的关联数据库,一个存储文档格式的版面库和一个存储文档项目的项目列表菜单;提供一个第一命令输入以选择执行文档编辑管理,创建模板或查找处理;如果选择执行文档编辑管理,通过第二个命令输入对文档库中某个指定文档夹执行文档编辑与文档管理中的至少一项,以得到被处理的数据,文档编辑至少包括数据复制、剪切、粘贴、保存和项目数据输入中的一个,文档管理至少包括文档修改、删除、保存和打印中的一个;根据不同的项目类型解析被处理的数据;并将解析处理的数据存储入文档库;在项目列表菜单中建立关联的数据项目管理,以便于在编辑文档时,允许在关联数据的关联数据库中根据输入的主要关键值进行查找,以在该文档中产生关联 (combine task) 的数据,并且以便于在关联数据库中记录相互关联数据;如果选择创建模板,至少执行以下子步骤中的一步:从已包含大量文档项目的版面库中选择一种版面进行修改;从项目列表菜单中选择一个预定义项目列表用于重新设计版面;并且定义新的文档项目和设计一个包含这些新文档项目的新版面,这些新文档项目与新版面分别存储在项目列表菜单与版面库中;如果选择执行查找处理,通过输入部分指定一个查找条件;确定查找条件是否是将用在整个文档库中进行查找或是在指定文档夹中进行查找;如果只在指定文档夹中进行查找,则从文档库中选取出指定文档夹放入数据缓冲区用于进行比较;如果在整个文档库中进行查找,则将文档库中的内容与查找条件进行比较,并选取出文档库中满足查找条件的全部文档;将文档库中一个文档夹里相应的文档数据、版面和项目列表组合以形成一个指定的完整的文档,并通过输出部分输出这个完整的文档。

US8190610B2
CLAIM 40
. A computer system including a plurality of computing devices , the computer system configured to process data of a data set , wherein the data set comprises a plurality of data groups , the computer system comprises at least one processor and memory that are operable to perform the following operations : for a first data (第二个) set having a plurality of first key-value pairs , wherein such first data set belongs to a first data group and the first key-value pairs have a first schema , partitioning the first data set into a plurality of data partitions and providing each data partition to a selected one of a plurality of mapping functions that are each user-configurable to independently output a plurality of lists of values for each of a set of keys found in such map function' ;

s corresponding data partition (第二个) to form a first intermediate data set (指定的一个) (指定的一个) having a first set of resulting key-value pairs ;

for a second data set having a plurality of second key-value pairs , wherein such second data set belongs to a second data group and the second key-value pairs have a second schema , partitioning the first data set into a plurality of data partitions and providing each data partition to a selected one of a plurality of mapping functions to form a second intermediate data set having a second set (接收一) of resulting key-value pairs differing from the first set of resulting key-value pairs , wherein the first schema differs from the second schema ;

and reducing the first and the second intermediate data set together so as to form an output data set (一个输入) , wherein the reducing is accomplished by iterating on at least one key from the first and second intermediate data set , and the output data set has a different schema than the first and second schema , wherein the first and second schema and the first and second set of resulting key-value pairs have a key in common .
CN1245936A
CLAIM 1
. 一种固定格式文字处理方法,其用于具有一个数据缓冲区、一个输入 (output data groups, output data set) 部分和一个输出部分的计算机系统中,该固定格式文字处理方法包含以下步骤:把数据缓冲区划分为:一个存储不同类型文档夹的文档库,其中每个文档夹中有相同格式的文档数据,一个存储文档格式的版面库和一个存储文档项目的项目列表菜单;提供一个命令输入以选择执行创建模板或是查找处理;如果选择创建模板,至少执行以下子步骤中的一步:从已包含多个文档项目的版面库中选择一种版面进行修改;从项目列表菜单中选择一个预定义项目列表用于重新设计版面;定义新的文档项目并设计一个包含文档项目的新版面,该新文档项目与新版面分别存储在项目列表菜单与版面库中;如果选择执行查找处理,通过输入部分指定一个查找条件;确定查找条件是否是在整个文档库中进行查找或是在指定文档夹中进行查找;如果只在指定文档夹中进行查找,则从文档库中选取出指定的文档夹放入数据缓冲区用于进行比较;如果在整个文档库中进行查找,则将文档库中的内容与查找条件进行比较,并选择地选取出文档库中满足查找条件的全部文档;将文档库中一个文档夹里相应的文档数据、版面和项目列表组合以形成一个指定的完整的文档,并通过输出部分输出这个完整的文档。

CN1245936A
CLAIM 2
. 一种固定格式文字处理方法,其用于具有一个数据缓冲区、一个输入部分和一个输出部分的计算机系统中,该固定格式文字处理方法包含以下步骤:把数据缓冲区划分为:一个存储不同类型文档夹的文档库,其中每个文档夹中有相同格式的文档数据,一个存储文档关联项目数据的关联数据库,一个存储文档格式的版面库和一个存储文档项目的项目列表菜单;提供一个第一命令输入以选择执行文档编辑管理,创建模板或查找处理;如果选择执行文档编辑管理,通过第二个 (first data, first data group, first data set, s corresponding data partition) 命令输入对文档库中某个指定文档夹执行文档编辑与文档管理中的至少一项,以得到被处理的数据,文档编辑至少包括数据复制、剪切、粘贴、保存和项目数据输入中的一个,文档管理至少包括文档修改、删除、保存和打印中的一个;根据不同的项目类型解析被处理的数据;并将解析处理的数据存储入文档库;在项目列表菜单中建立关联的数据项目管理,以便于在编辑文档时,允许在关联数据的关联数据库中根据输入的主要关键值进行查找,以在该文档中产生关联的数据,并且以便于在关联数据库中记录相互关联数据;如果选择创建模板,至少执行以下子步骤中的一步:从已包含大量文档项目的版面库中选择一种版面进行修改;从项目列表菜单中选择一个预定义项目列表用于重新设计版面;并且定义新的文档项目和设计一个包含这些新文档项目的新版面,这些新文档项目与新版面分别存储在项目列表菜单与版面库中;如果选择执行查找处理,通过输入部分指定一个查找条件;确定查找条件是否是将用在整个文档库中进行查找或是在指定文档夹中进行查找;如果只在指定文档夹中进行查找,则从文档库中选取出指定文档夹放入数据缓冲区用于进行比较;如果在整个文档库中进行查找,则将文档库中的内容与查找条件进行比较,并选取出文档库中满足查找条件的全部文档;将文档库中一个文档夹里相应的文档数据、版面和项目列表组合以形成一个指定的完整的文档,并通过输出部分输出这个完整的文档。

CN1245936A
CLAIM 3
. 一种固定格式文字处理装置,包括一个数据缓冲区,一个输入部分和一个输出部分,其中的改进包含:所说的数据缓冲区被划分为:一个存储不同文档夹数据的文档库,一个存储文档版面的版面库和一个存储项目列表的项目列表菜单;控制装置,其与所述的输入部分和输出部分连接,用于根据从所述输入部分接收的输入的命令在创建模板或进行查找处理之间选择;模板编辑管理装置,其与所述的控制装置和数据缓冲区连接,用于至少从包含多个文档项目的项目列表中及从所述项目列表菜单和所述版面库中已有版面中选择一个进行修改,并定义新的文档项目,并设计一个包含这些文档项目的新版面,新文档项目与新版面分别被存储在所述项目列表菜单和所述版面库中;数据查找装置,其与所述控制装置和所述数据缓冲区连接,用于从所述控制装置通过所述输入部分接收一 (second set) 个指定的查找条件,从所述文档库中至少选取出一个指定文档夹与该查找条件进行比较,选取出从所述文档库中选择的至少一个指定文档夹里满足查找条件的全部文档;所述控制装置包含将所述文档库中一个文档夹里相应的文档数据、版面和项目列表组合以形成一个指定的完整的文档,并通过所述输出部分输出这个完整的文档的装置。

CN1245936A
CLAIM 4
. 一种固定格式文字处理装置,包括一个数据缓冲区,一个输入部分和一个输出部分,其中的改进包含:所述的数据缓冲区被划分为:一个存储不同文档夹数据的文档库,一个存储关联项目数据的关联数据库,一个存储文档版面的版面库和一个存储项目列表的项目列表菜单;控制装置,其与所述输入部分和所述输出部分连接,用于根据从所述输入部分接收的输入命令在文档编辑管理、创建模板和进行查找处理之中进行选择;文档编辑管理装置与所述控制装置和所述数据缓冲区连接,用于对所述文档库中指定的文档夹中的一已有文档或一新文档至少进行文档编辑和文档管理中的一项,以获取处理的数据,文档编辑至少包括数据复制、剪切、粘贴、保存和字符输入中的一项,文档管理至少包括文档修改、删除、保存和打印中的一项,所述文档编辑管理装置按文档数据的不同项目类型解析被处理的数据并将解析的处理数据存入所述文档库;关联管理装置,其与所述文档编辑管理装置和所述数据缓冲区连接,用于在所述项目列表菜单中建立关联数据项目管理,以便允许在与输入项目数据对应的关联数据的所述关联数据库中进行查找,以便在文档中产生关联数据,并在所述关联数据库中记录相关关联数据;模板编辑管理装置,其与所述控制装置和所述数据缓冲区连接,用于从包含多个文档项目的项目列表中和所述项目列表菜单与所述版面库的已有版面中进行选择用于修改,并定义新的文档项目和设计一个包含这些文档项目的新版面,所述模板编辑管理装置将新文档项目与新版面分别存储在所述项目列表菜单与所述版面库中;数据查找装置,其与所述控制装置和所述数据缓冲区连接,用于接收来自控制装置通过输入部分指定的一个 (first intermediate data set, intermediate data set) 查找条件,将从所述文档库中选取出的至少一个指定文档夹与该查找条件进行比较,选取出所述文档库中在所述的至少一个指定文档夹里满足查找条件的全部文档;所述的控制装置包含将所述文档库中一个文档夹里相应的文档数据、版面和项目列表组合以形成一个指定的完整的文档,并通过所述输出部分输出这个完整的文档的装置。

US8190610B2
CLAIM 41
. The computer system of claim 40 , wherein the reducing is accomplished by iterating on at least one key from each of the first and second intermediate data set (指定的一个) so that the output data set (一个输入) is a merging of a portion of the first and second intermediate data set .
CN1245936A
CLAIM 1
. 一种固定格式文字处理方法,其用于具有一个数据缓冲区、一个输入 (output data groups, output data set) 部分和一个输出部分的计算机系统中,该固定格式文字处理方法包含以下步骤:把数据缓冲区划分为:一个存储不同类型文档夹的文档库,其中每个文档夹中有相同格式的文档数据,一个存储文档格式的版面库和一个存储文档项目的项目列表菜单;提供一个命令输入以选择执行创建模板或是查找处理;如果选择创建模板,至少执行以下子步骤中的一步:从已包含多个文档项目的版面库中选择一种版面进行修改;从项目列表菜单中选择一个预定义项目列表用于重新设计版面;定义新的文档项目并设计一个包含文档项目的新版面,该新文档项目与新版面分别存储在项目列表菜单与版面库中;如果选择执行查找处理,通过输入部分指定一个查找条件;确定查找条件是否是在整个文档库中进行查找或是在指定文档夹中进行查找;如果只在指定文档夹中进行查找,则从文档库中选取出指定的文档夹放入数据缓冲区用于进行比较;如果在整个文档库中进行查找,则将文档库中的内容与查找条件进行比较,并选择地选取出文档库中满足查找条件的全部文档;将文档库中一个文档夹里相应的文档数据、版面和项目列表组合以形成一个指定的完整的文档,并通过输出部分输出这个完整的文档。

CN1245936A
CLAIM 4
. 一种固定格式文字处理装置,包括一个数据缓冲区,一个输入部分和一个输出部分,其中的改进包含:所述的数据缓冲区被划分为:一个存储不同文档夹数据的文档库,一个存储关联项目数据的关联数据库,一个存储文档版面的版面库和一个存储项目列表的项目列表菜单;控制装置,其与所述输入部分和所述输出部分连接,用于根据从所述输入部分接收的输入命令在文档编辑管理、创建模板和进行查找处理之中进行选择;文档编辑管理装置与所述控制装置和所述数据缓冲区连接,用于对所述文档库中指定的文档夹中的一已有文档或一新文档至少进行文档编辑和文档管理中的一项,以获取处理的数据,文档编辑至少包括数据复制、剪切、粘贴、保存和字符输入中的一项,文档管理至少包括文档修改、删除、保存和打印中的一项,所述文档编辑管理装置按文档数据的不同项目类型解析被处理的数据并将解析的处理数据存入所述文档库;关联管理装置,其与所述文档编辑管理装置和所述数据缓冲区连接,用于在所述项目列表菜单中建立关联数据项目管理,以便允许在与输入项目数据对应的关联数据的所述关联数据库中进行查找,以便在文档中产生关联数据,并在所述关联数据库中记录相关关联数据;模板编辑管理装置,其与所述控制装置和所述数据缓冲区连接,用于从包含多个文档项目的项目列表中和所述项目列表菜单与所述版面库的已有版面中进行选择用于修改,并定义新的文档项目和设计一个包含这些文档项目的新版面,所述模板编辑管理装置将新文档项目与新版面分别存储在所述项目列表菜单与所述版面库中;数据查找装置,其与所述控制装置和所述数据缓冲区连接,用于接收来自控制装置通过输入部分指定的一个 (first intermediate data set, intermediate data set) 查找条件,将从所述文档库中选取出的至少一个指定文档夹与该查找条件进行比较,选取出所述文档库中在所述的至少一个指定文档夹里满足查找条件的全部文档;所述的控制装置包含将所述文档库中一个文档夹里相应的文档数据、版面和项目列表组合以形成一个指定的完整的文档,并通过所述输出部分输出这个完整的文档的装置。

US8190610B2
CLAIM 42
. The computer system of claim 40 , wherein the reducing is accomplished by a plurality of reducers , the d the at least one processor and memory further operable for : partitioning the first or second intermediate data set (指定的一个) into a plurality of partitions , at least some of the key-value pairs of the first or second intermediate data being provided to more than one of the partitions ;

and providing the partitioned intermediate data set of each partition to a separate one of the reducers .
CN1245936A
CLAIM 4
. 一种固定格式文字处理装置,包括一个数据缓冲区,一个输入部分和一个输出部分,其中的改进包含:所述的数据缓冲区被划分为:一个存储不同文档夹数据的文档库,一个存储关联项目数据的关联数据库,一个存储文档版面的版面库和一个存储项目列表的项目列表菜单;控制装置,其与所述输入部分和所述输出部分连接,用于根据从所述输入部分接收的输入命令在文档编辑管理、创建模板和进行查找处理之中进行选择;文档编辑管理装置与所述控制装置和所述数据缓冲区连接,用于对所述文档库中指定的文档夹中的一已有文档或一新文档至少进行文档编辑和文档管理中的一项,以获取处理的数据,文档编辑至少包括数据复制、剪切、粘贴、保存和字符输入中的一项,文档管理至少包括文档修改、删除、保存和打印中的一项,所述文档编辑管理装置按文档数据的不同项目类型解析被处理的数据并将解析的处理数据存入所述文档库;关联管理装置,其与所述文档编辑管理装置和所述数据缓冲区连接,用于在所述项目列表菜单中建立关联数据项目管理,以便允许在与输入项目数据对应的关联数据的所述关联数据库中进行查找,以便在文档中产生关联数据,并在所述关联数据库中记录相关关联数据;模板编辑管理装置,其与所述控制装置和所述数据缓冲区连接,用于从包含多个文档项目的项目列表中和所述项目列表菜单与所述版面库的已有版面中进行选择用于修改,并定义新的文档项目和设计一个包含这些文档项目的新版面,所述模板编辑管理装置将新文档项目与新版面分别存储在所述项目列表菜单与所述版面库中;数据查找装置,其与所述控制装置和所述数据缓冲区连接,用于接收来自控制装置通过输入部分指定的一个 (first intermediate data set, intermediate data set) 查找条件,将从所述文档库中选取出的至少一个指定文档夹与该查找条件进行比较,选取出所述文档库中在所述的至少一个指定文档夹里满足查找条件的全部文档;所述的控制装置包含将所述文档库中一个文档夹里相应的文档数据、版面和项目列表组合以形成一个指定的完整的文档,并通过所述输出部分输出这个完整的文档的装置。

US8190610B2
CLAIM 43
. The computer system of claim 42 , wherein all of the key-value pairs of the first and second intermediate data set (指定的一个) are provided to all of the reducers .
CN1245936A
CLAIM 4
. 一种固定格式文字处理装置,包括一个数据缓冲区,一个输入部分和一个输出部分,其中的改进包含:所述的数据缓冲区被划分为:一个存储不同文档夹数据的文档库,一个存储关联项目数据的关联数据库,一个存储文档版面的版面库和一个存储项目列表的项目列表菜单;控制装置,其与所述输入部分和所述输出部分连接,用于根据从所述输入部分接收的输入命令在文档编辑管理、创建模板和进行查找处理之中进行选择;文档编辑管理装置与所述控制装置和所述数据缓冲区连接,用于对所述文档库中指定的文档夹中的一已有文档或一新文档至少进行文档编辑和文档管理中的一项,以获取处理的数据,文档编辑至少包括数据复制、剪切、粘贴、保存和字符输入中的一项,文档管理至少包括文档修改、删除、保存和打印中的一项,所述文档编辑管理装置按文档数据的不同项目类型解析被处理的数据并将解析的处理数据存入所述文档库;关联管理装置,其与所述文档编辑管理装置和所述数据缓冲区连接,用于在所述项目列表菜单中建立关联数据项目管理,以便允许在与输入项目数据对应的关联数据的所述关联数据库中进行查找,以便在文档中产生关联数据,并在所述关联数据库中记录相关关联数据;模板编辑管理装置,其与所述控制装置和所述数据缓冲区连接,用于从包含多个文档项目的项目列表中和所述项目列表菜单与所述版面库的已有版面中进行选择用于修改,并定义新的文档项目和设计一个包含这些文档项目的新版面,所述模板编辑管理装置将新文档项目与新版面分别存储在所述项目列表菜单与所述版面库中;数据查找装置,其与所述控制装置和所述数据缓冲区连接,用于接收来自控制装置通过输入部分指定的一个 (first intermediate data set, intermediate data set) 查找条件,将从所述文档库中选取出的至少一个指定文档夹与该查找条件进行比较,选取出所述文档库中在所述的至少一个指定文档夹里满足查找条件的全部文档;所述的控制装置包含将所述文档库中一个文档夹里相应的文档数据、版面和项目列表组合以形成一个指定的完整的文档,并通过所述输出部分输出这个完整的文档的装置。

US8190610B2
CLAIM 44
. The computer system of claim 42 , wherein at least some of the reducers include a sort , group-by-key and combine task (产生关联) , the d the at least one processor and memory are further operable for : generating and providing metadata for at least some of the mapping , partitioning , combining , grouping and sorting .
CN1245936A
CLAIM 2
. 一种固定格式文字处理方法,其用于具有一个数据缓冲区、一个输入部分和一个输出部分的计算机系统中,该固定格式文字处理方法包含以下步骤:把数据缓冲区划分为:一个存储不同类型文档夹的文档库,其中每个文档夹中有相同格式的文档数据,一个存储文档关联项目数据的关联数据库,一个存储文档格式的版面库和一个存储文档项目的项目列表菜单;提供一个第一命令输入以选择执行文档编辑管理,创建模板或查找处理;如果选择执行文档编辑管理,通过第二个命令输入对文档库中某个指定文档夹执行文档编辑与文档管理中的至少一项,以得到被处理的数据,文档编辑至少包括数据复制、剪切、粘贴、保存和项目数据输入中的一个,文档管理至少包括文档修改、删除、保存和打印中的一个;根据不同的项目类型解析被处理的数据;并将解析处理的数据存储入文档库;在项目列表菜单中建立关联的数据项目管理,以便于在编辑文档时,允许在关联数据的关联数据库中根据输入的主要关键值进行查找,以在该文档中产生关联 (combine task) 的数据,并且以便于在关联数据库中记录相互关联数据;如果选择创建模板,至少执行以下子步骤中的一步:从已包含大量文档项目的版面库中选择一种版面进行修改;从项目列表菜单中选择一个预定义项目列表用于重新设计版面;并且定义新的文档项目和设计一个包含这些新文档项目的新版面,这些新文档项目与新版面分别存储在项目列表菜单与版面库中;如果选择执行查找处理,通过输入部分指定一个查找条件;确定查找条件是否是将用在整个文档库中进行查找或是在指定文档夹中进行查找;如果只在指定文档夹中进行查找,则从文档库中选取出指定文档夹放入数据缓冲区用于进行比较;如果在整个文档库中进行查找,则将文档库中的内容与查找条件进行比较,并选取出文档库中满足查找条件的全部文档;将文档库中一个文档夹里相应的文档数据、版面和项目列表组合以形成一个指定的完整的文档,并通过输出部分输出这个完整的文档。




US8190610B2

Filed: 2006-10-05     Issued: 2012-05-29

MapReduce for distributed database processing

(Original Assignee) Yahoo Inc     (Current Assignee) R2 Solutions LLC ; Altaba Inc

Ali Dasdan, Hung-Chih Yang, Ruey-Lung Hsiao
EP1040434A1

Filed: 1998-12-21     Issued: 2000-10-04

Methods and apparatus for efficiently splitting query execution across client and server in an object-relational mapping

(Original Assignee) Linda G. Demichiel; Roderic G. G. Cattell     (Current Assignee) Sun Microsystems Inc

Linda G. Demichiel, Roderic G. G. Cattell
US8190610B2
CLAIM 1
. A method of processing data of a data set over a distributed system , wherein the data set comprises a plurality of data groups , the method comprising : partitioning the data of each one of the data groups into a plurality of data partitions that each have a plurality of key-value pairs and providing each data partition to a selected one of a plurality of mapping functions that are each user-configurable to independently output a plurality of lists of values for each of a set of keys found in such map function' ;

s corresponding data partition to form corresponding intermediate data for that data group and identifiable to that data group , wherein the data of a first data (first data) group has a different schema than the data of a second data group and the data of the first data group is mapped differently than the data of the second data group so that different lists of values are output for the corresponding different intermediate data , wherein the different schema and corresponding different intermediate data have a key in common ;

and reducing the intermediate data for the data groups to at least one output data group , including processing the intermediate data for each data group in a manner that is defined to correspond to that data group , so as to result in a merging of the corresponding different intermediate data based on the key in common , wherein the mapping and reducing operations (receiving step) are performed by a distributed system .
EP1040434A1
CLAIM 14
. The method of claim 1 , wherein the receiving step (reducing operations) comprises the step of receiving an OQL query from a user program .

EP1040434A1
CLAIM 68
. A method for performing object-based querying in a system having at least one non-object-based database management system , comprising the steps of : receiving an object-based query from a user program ;
separating the object-based query into a first server portion , a second server portion , and a client portion ;
transmitting the first server portion to a first data (first data) base management system ;
and transmitting the second server portion to a second database management system .

US8190610B2
CLAIM 17
. A computer system including a plurality of computing devices , the computer system configured to process data of a data set , wherein the data set comprises a plurality of data groups , the computer system comprises at least one processor and memory that are operable to perform the following operations : partitioning the data of each one of the data groups into a plurality of data partitions that each have a plurality of key-value pairs and providing each data partition to a selected one of a plurality of mapping functions that are each user-configurable to independently output a plurality of lists of values for each of a set of keys found in such map function' ;

s corresponding data partition to form corresponding intermediate data for that data group and identifiable to that data group , wherein the data of a first data (first data) group has a different schema than the data of a second data group and the data of the first data group is mapped differently than the data of the second data group so that different lists of values are output for the corresponding different intermediate data , wherein the different schema and corresponding different intermediate data have a key in common ;

and reduce the intermediate data for the data groups to at least one output data group , including processing the intermediate data for each data group in a manner that is defined to correspond to that data group so as to result in a merging of the corresponding different intermediate data based on the key in common .
EP1040434A1
CLAIM 68
. A method for performing object-based querying in a system having at least one non-object-based database management system , comprising the steps of : receiving an object-based query from a user program ;
separating the object-based query into a first server portion , a second server portion , and a client portion ;
transmitting the first server portion to a first data (first data) base management system ;
and transmitting the second server portion to a second database management system .

US8190610B2
CLAIM 33
. A map-reduce method of processing data from a plurality of groups having different schema over a computer system , the method comprising : for a first data (first data) set having a plurality of first key-value pairs , wherein such first data set belongs to a first data group and the first key-value pairs have a first schema , partitioning the first data set into a plurality of data partitions and providing each data partition to a selected one of a plurality of mapping functions that are each user-configurable to independently output a plurality of lists of values for each of a set of keys found in such map function' ;

s corresponding data partition to form a first intermediate data set having a first set (first set) of resulting key-value pairs ;

for a second data set having a plurality of second key-value pairs , wherein such second data set belongs to a second data group and the second key-value pairs have a second schema , partitioning the first data set into a plurality of data partitions and providing each data partition to a selected one of a plurality of mapping functions to form a second intermediate data set having a second set (second set) of resulting key-value pairs differing from the first set of resulting key-value pairs , wherein the first schema differs from the second schema , wherein the first and second schema and the first and second set of resulting key-value pairs have a key in common ;

and reducing the first and the second intermediate data set together so as to form an output data set , wherein the reducing is accomplished by iterating on at least one key from the first and second intermediate data set , and the output data set has a different schema than the first and second schema , wherein the mapping and reducing operations (receiving step) are performed by a distributed system .
EP1040434A1
CLAIM 13
. The method of claim 2 , further comprising the steps of : receiving a command from a user program to update the memory ;
updating the memory in response to the command ;
receiving a second object-based query from the user program ;
separating the second object-based query into a second server portion and a second client portion ;
updating the database management system based on the contents of the memory ;
transmitting the second server portion to a database management system ;
obtaining a second set (second set) of data corresponding to the second server portion from the database management system ;
and aborting the updating of the database management system .

EP1040434A1
CLAIM 14
. The method of claim 1 , wherein the receiving step (reducing operations) comprises the step of receiving an OQL query from a user program .

EP1040434A1
CLAIM 21
. The method of claim 2 , wherein the forming step comprises the steps of : creating at least one object ;
and inserting the first set (first set) of data into at least one field of the at least one object .

EP1040434A1
CLAIM 68
. A method for performing object-based querying in a system having at least one non-object-based database management system , comprising the steps of : receiving an object-based query from a user program ;
separating the object-based query into a first server portion , a second server portion , and a client portion ;
transmitting the first server portion to a first data (first data) base management system ;
and transmitting the second server portion to a second database management system .

US8190610B2
CLAIM 40
. A computer system including a plurality of computing devices , the computer system configured to process data of a data set , wherein the data set comprises a plurality of data groups , the computer system comprises at least one processor and memory that are operable to perform the following operations : for a first data (first data) set having a plurality of first key-value pairs , wherein such first data set belongs to a first data group and the first key-value pairs have a first schema , partitioning the first data set into a plurality of data partitions and providing each data partition to a selected one of a plurality of mapping functions that are each user-configurable to independently output a plurality of lists of values for each of a set of keys found in such map function' ;

s corresponding data partition to form a first intermediate data set having a first set (first set) of resulting key-value pairs ;

for a second data set having a plurality of second key-value pairs , wherein such second data set belongs to a second data group and the second key-value pairs have a second schema , partitioning the first data set into a plurality of data partitions and providing each data partition to a selected one of a plurality of mapping functions to form a second intermediate data set having a second set (second set) of resulting key-value pairs differing from the first set of resulting key-value pairs , wherein the first schema differs from the second schema ;

and reducing the first and the second intermediate data set together so as to form an output data set , wherein the reducing is accomplished by iterating on at least one key from the first and second intermediate data set , and the output data set has a different schema than the first and second schema , wherein the first and second schema and the first and second set of resulting key-value pairs have a key in common .
EP1040434A1
CLAIM 13
. The method of claim 2 , further comprising the steps of : receiving a command from a user program to update the memory ;
updating the memory in response to the command ;
receiving a second object-based query from the user program ;
separating the second object-based query into a second server portion and a second client portion ;
updating the database management system based on the contents of the memory ;
transmitting the second server portion to a database management system ;
obtaining a second set (second set) of data corresponding to the second server portion from the database management system ;
and aborting the updating of the database management system .

EP1040434A1
CLAIM 21
. The method of claim 2 , wherein the forming step comprises the steps of : creating at least one object ;
and inserting the first set (first set) of data into at least one field of the at least one object .

EP1040434A1
CLAIM 68
. A method for performing object-based querying in a system having at least one non-object-based database management system , comprising the steps of : receiving an object-based query from a user program ;
separating the object-based query into a first server portion , a second server portion , and a client portion ;
transmitting the first server portion to a first data (first data) base management system ;
and transmitting the second server portion to a second database management system .




US8190610B2

Filed: 2006-10-05     Issued: 2012-05-29

MapReduce for distributed database processing

(Original Assignee) Yahoo Inc     (Current Assignee) R2 Solutions LLC ; Altaba Inc

Ali Dasdan, Hung-Chih Yang, Ruey-Lung Hsiao
CN1211769A

Filed: 1998-06-26     Issued: 1999-03-24

基于贝叶斯网络的用于文件检索的方法和设备

(Original Assignee) Chinese University of Hong Kong CUHK     (Current Assignee) Chinese University of Hong Kong CUHK

黄永成, 秦桉
US8190610B2
CLAIM 1
. A method of processing data of a data set over a distributed system , wherein the data set comprises a plurality of data groups , the method comprising : partitioning the data of each one of the data groups into a plurality of data partitions that each have a plurality of key-value pairs and providing each data partition to a selected one of a plurality of mapping functions that are each user-configurable to independently output a plurality of lists of values for each of a set of keys found in such map function' ;

s corresponding data partition (第二个) to form corresponding intermediate data for that data group and identifiable to that data group , wherein the data of a first data (第二个) group has a different schema than the data of a second data group and the data of the first data group is mapped differently than the data of the second data group so that different lists of values are output for the corresponding different intermediate data (通过分析) , wherein the different schema and corresponding different intermediate data have a key in common ;

and reducing the intermediate data for the data groups to at least one output data group , including processing the intermediate data for each data group in a manner that is defined to correspond to that data group , so as to result in a merging of the corresponding different intermediate data based on the key in common , wherein the mapping and reducing operations are performed by a distributed system .
CN1211769A
CLAIM 2
.一种如权利要求1所述之方法,其中代价函数为两个分布的相异性测度,第一个分布为一个常规文件中的文件关键词出现的似然分布,第二个 (first data, first data group, first data set, s corresponding data partition) 分布也是一个常规文件中的文件关键词出现的似然分布,但第二个分布是由所选择的主题词近似而得的。

CN1211769A
CLAIM 23
.一种将一个用户的一个请求与一个数据库中的至少一个文件进行匹配的系统,这种匹配是根据这个数据库中为每一个文件存储的索引结构中的一组主题词进行的,这个系统包括:一个分析单元,它通过分析 (corresponding different intermediate data) 这个请求得到能满足一个预先确定的域内关键词的定义的请求关键词;一个相近度计算单元,它计算经分析的请求与这个数据库的一批文件中的每一个文件的主题词间的相近度,这里,相近度计算单元所使用的数据库的一批文件中的每一个文件的主题词的数目是相同的;以及一个排序单元,它对这个数据库的一批文件中的至少两个文件的相近度进行排序。

US8190610B2
CLAIM 17
. A computer system including a plurality of computing devices (计算机中) , the computer system configured to process data of a data set , wherein the data set comprises a plurality of data groups , the computer system comprises at least one processor and memory that are operable to perform the following operations : partitioning the data of each one of the data groups into a plurality of data partitions that each have a plurality of key-value pairs and providing each data partition to a selected one of a plurality of mapping functions that are each user-configurable to independently output a plurality of lists of values for each of a set of keys found in such map function' ;

s corresponding data partition (第二个) to form corresponding intermediate data for that data group and identifiable to that data group , wherein the data of a first data (第二个) group has a different schema than the data of a second data group and the data of the first data group is mapped differently than the data of the second data group so that different lists of values are output for the corresponding different intermediate data (通过分析) , wherein the different schema and corresponding different intermediate data have a key in common ;

and reduce the intermediate data for the data groups to at least one output data group , including processing the intermediate data for each data group in a manner that is defined to correspond to that data group so as to result in a merging of the corresponding different intermediate data based on the key in common .
CN1211769A
CLAIM 2
.一种如权利要求1所述之方法,其中代价函数为两个分布的相异性测度,第一个分布为一个常规文件中的文件关键词出现的似然分布,第二个 (first data, first data group, first data set, s corresponding data partition) 分布也是一个常规文件中的文件关键词出现的似然分布,但第二个分布是由所选择的主题词近似而得的。

CN1211769A
CLAIM 17
.一种在计算机中 (computing devices) 用来获得代表数据库中的一个文件的主题词的方法,该文件包含词,主题词应用于具有一个用户界面的基于计算机的文件检索器,这种方法包括下列步骤:选取这些词的一个子集,构成无重复的文件关键词;将这些文件关键词分为有序组,从d1到dc-1,每个组都唯一地与一组预先建立的关键词类中的一个相对应,这些类标记为从1到一个数c,在数据库中,从最常用词到最不常用词;从c-1个最常用词的组d1-dc中的每一个组里找出一个文件关键词ki,它使下面的互信息之和达到最大值:& ;
Sigma ;
wj& ;
Element ;
di+lI(W- ;
Ki)]]> ;
其中,i是该组的下标,Ki是一个随机变量,它对应于一个常规文件中关键词ki的出现与不出现,Wi是一个随机变量,它对应于一个常规文件中关键词wj的出现与不出现;然后在最不常用词的第c组中找出一个文件关键词kc,它使互信息I(Kc-1 , Kc)达到最大值,在此,文件关键词k1 , … , kc就是这个主题词列。

CN1211769A
CLAIM 23
.一种将一个用户的一个请求与一个数据库中的至少一个文件进行匹配的系统,这种匹配是根据这个数据库中为每一个文件存储的索引结构中的一组主题词进行的,这个系统包括:一个分析单元,它通过分析 (corresponding different intermediate data) 这个请求得到能满足一个预先确定的域内关键词的定义的请求关键词;一个相近度计算单元,它计算经分析的请求与这个数据库的一批文件中的每一个文件的主题词间的相近度,这里,相近度计算单元所使用的数据库的一批文件中的每一个文件的主题词的数目是相同的;以及一个排序单元,它对这个数据库的一批文件中的至少两个文件的相近度进行排序。

US8190610B2
CLAIM 33
. A map-reduce method of processing data from a plurality of groups having different schema over a computer system , the method comprising : for a first data (第二个) set having a plurality of first key-value pairs , wherein such first data set belongs to a first data group and the first key-value pairs have a first schema , partitioning the first data set into a plurality of data partitions and providing each data partition to a selected one of a plurality of mapping functions that are each user-configurable to independently output a plurality of lists of values for each of a set of keys found in such map function' ;

s corresponding data partition (第二个) to form a first intermediate data set having a first set of resulting key-value pairs ;

for a second data set having a plurality of second key-value pairs , wherein such second data set belongs to a second data group and the second key-value pairs have a second schema , partitioning the first data set into a plurality of data partitions and providing each data partition to a selected one of a plurality of mapping functions to form a second intermediate data set having a second set of resulting key-value pairs differing from the first set of resulting key-value pairs , wherein the first schema differs from the second schema , wherein the first and second schema and the first and second set of resulting key-value pairs have a key in common ;

and reducing the first and the second intermediate data set together so as to form an output data set , wherein the reducing is accomplished by iterating on at least one key from the first and second intermediate data set , and the output data set has a different schema than the first and second schema , wherein the mapping and reducing operations are performed by a distributed system .
CN1211769A
CLAIM 2
.一种如权利要求1所述之方法,其中代价函数为两个分布的相异性测度,第一个分布为一个常规文件中的文件关键词出现的似然分布,第二个 (first data, first data group, first data set, s corresponding data partition) 分布也是一个常规文件中的文件关键词出现的似然分布,但第二个分布是由所选择的主题词近似而得的。

US8190610B2
CLAIM 40
. A computer system including a plurality of computing devices (计算机中) , the computer system configured to process data of a data set , wherein the data set comprises a plurality of data groups , the computer system comprises at least one processor and memory that are operable to perform the following operations : for a first data (第二个) set having a plurality of first key-value pairs , wherein such first data set belongs to a first data group and the first key-value pairs have a first schema , partitioning the first data set into a plurality of data partitions and providing each data partition to a selected one of a plurality of mapping functions that are each user-configurable to independently output a plurality of lists of values for each of a set of keys found in such map function' ;

s corresponding data partition (第二个) to form a first intermediate data set having a first set of resulting key-value pairs ;

for a second data set having a plurality of second key-value pairs , wherein such second data set belongs to a second data group and the second key-value pairs have a second schema , partitioning the first data set into a plurality of data partitions and providing each data partition to a selected one of a plurality of mapping functions to form a second intermediate data set having a second set of resulting key-value pairs differing from the first set of resulting key-value pairs , wherein the first schema differs from the second schema ;

and reducing the first and the second intermediate data set together so as to form an output data set , wherein the reducing is accomplished by iterating on at least one key from the first and second intermediate data set , and the output data set has a different schema than the first and second schema , wherein the first and second schema and the first and second set of resulting key-value pairs have a key in common .
CN1211769A
CLAIM 2
.一种如权利要求1所述之方法,其中代价函数为两个分布的相异性测度,第一个分布为一个常规文件中的文件关键词出现的似然分布,第二个 (first data, first data group, first data set, s corresponding data partition) 分布也是一个常规文件中的文件关键词出现的似然分布,但第二个分布是由所选择的主题词近似而得的。

CN1211769A
CLAIM 17
.一种在计算机中 (computing devices) 用来获得代表数据库中的一个文件的主题词的方法,该文件包含词,主题词应用于具有一个用户界面的基于计算机的文件检索器,这种方法包括下列步骤:选取这些词的一个子集,构成无重复的文件关键词;将这些文件关键词分为有序组,从d1到dc-1,每个组都唯一地与一组预先建立的关键词类中的一个相对应,这些类标记为从1到一个数c,在数据库中,从最常用词到最不常用词;从c-1个最常用词的组d1-dc中的每一个组里找出一个文件关键词ki,它使下面的互信息之和达到最大值:& ;
Sigma ;
wj& ;
Element ;
di+lI(W- ;
Ki)]]> ;
其中,i是该组的下标,Ki是一个随机变量,它对应于一个常规文件中关键词ki的出现与不出现,Wi是一个随机变量,它对应于一个常规文件中关键词wj的出现与不出现;然后在最不常用词的第c组中找出一个文件关键词kc,它使互信息I(Kc-1 , Kc)达到最大值,在此,文件关键词k1 , … , kc就是这个主题词列。




US8190610B2

Filed: 2006-10-05     Issued: 2012-05-29

MapReduce for distributed database processing

(Original Assignee) Yahoo Inc     (Current Assignee) R2 Solutions LLC ; Altaba Inc

Ali Dasdan, Hung-Chih Yang, Ruey-Lung Hsiao
US6158044A

Filed: 1998-05-20     Issued: 2000-12-05

Proposal based architecture system

(Original Assignee) ePropose Inc     (Current Assignee) ePropose Inc

John J. Tibbetts
US8190610B2
CLAIM 1
. A method of processing data of a data set over a distributed system , wherein the data set comprises a plurality of data groups , the method comprising : partitioning the data of each one of the data groups into a plurality of data partitions that each have a plurality of key-value pairs and providing each data partition to a selected one of a plurality of mapping functions that are each user-configurable to independently output a plurality of lists of values for each of a set of keys found in such map function' ;

s corresponding data partition to form corresponding intermediate data for that data group and identifiable to that data group , wherein the data of a first data group has a different schema than the data of a second data group and the data of the first data group is mapped differently than the data of the second data group so that different lists of values are output for the corresponding different intermediate data (User Interface) , wherein the different schema and corresponding different intermediate data have a key in common ;

and reducing the intermediate data for the data groups to at least one output data group , including processing the intermediate data for each data group in a manner that is defined to correspond to that data group , so as to result in a merging of the corresponding different intermediate data based on the key in common , wherein the mapping and reducing operations are performed by a distributed system .
US6158044A
CLAIM 4
. The process of claim 1 , further comprising the step of said coordinator communicating with user interfaces selected from a group consisting of an Internet Interface , Graphical User Interface (different intermediate data) (GUI) , Object Oriented User Interface (OOUI) , proprietary interface , bar code readers and keypads .

US8190610B2
CLAIM 14
. The method of claim 13 , wherein : the reducing step is carried out by a plurality of reducers ;

and the data that is not intermediate data , for a particular reducer (rule base) , includes data (includes data) that is associated with another reducer .
US6158044A
CLAIM 15
. The process of claim 1 , further comprising the steps of said object : recognizing any of said data that has become stale data ;
and providing user options for responding to said stale data , said user options being selected from a group consisting of user correction , rule base (particular reducer) d correction , and error flagging .

US6158044A
CLAIM 31
. The apparatus of claim 19 , wherein said object includes data (includes data) security selected from a group consisting of authorization , authentication , and digital signatures .

US8190610B2
CLAIM 15
. The method of claim 13 , wherein : the reducing step is carried out by a plurality of reducers ;

and the data that is not intermediate data , for a particular reducer (rule base) , includes data (includes data) that is associated with that reducer .
US6158044A
CLAIM 15
. The process of claim 1 , further comprising the steps of said object : recognizing any of said data that has become stale data ;
and providing user options for responding to said stale data , said user options being selected from a group consisting of user correction , rule base (particular reducer) d correction , and error flagging .

US6158044A
CLAIM 31
. The apparatus of claim 19 , wherein said object includes data (includes data) security selected from a group consisting of authorization , authentication , and digital signatures .

US8190610B2
CLAIM 17
. A computer system including a plurality of computing devices , the computer system configured to process data of a data set , wherein the data set comprises a plurality of data groups , the computer system comprises at least one processor and memory that are operable to perform the following operations : partitioning the data of each one of the data groups into a plurality of data partitions that each have a plurality of key-value pairs and providing each data partition to a selected one of a plurality of mapping functions that are each user-configurable to independently output a plurality of lists of values for each of a set of keys found in such map function' ;

s corresponding data partition to form corresponding intermediate data for that data group and identifiable to that data group , wherein the data of a first data group has a different schema than the data of a second data group and the data of the first data group is mapped differently than the data of the second data group so that different lists of values are output for the corresponding different intermediate data (User Interface) , wherein the different schema and corresponding different intermediate data have a key in common ;

and reduce the intermediate data for the data groups to at least one output data group , including processing the intermediate data for each data group in a manner that is defined to correspond to that data group so as to result in a merging of the corresponding different intermediate data based on the key in common .
US6158044A
CLAIM 4
. The process of claim 1 , further comprising the step of said coordinator communicating with user interfaces selected from a group consisting of an Internet Interface , Graphical User Interface (different intermediate data) (GUI) , Object Oriented User Interface (OOUI) , proprietary interface , bar code readers and keypads .

US8190610B2
CLAIM 30
. The computer system of claim 29 , wherein : the at least one processor and memory are further operable to reduce the intermediate data via a plurality of reducers ;

and the data that is not intermediate data , for a particular reducer (rule base) , includes data (includes data) that is associated with another reducer .
US6158044A
CLAIM 15
. The process of claim 1 , further comprising the steps of said object : recognizing any of said data that has become stale data ;
and providing user options for responding to said stale data , said user options being selected from a group consisting of user correction , rule base (particular reducer) d correction , and error flagging .

US6158044A
CLAIM 31
. The apparatus of claim 19 , wherein said object includes data (includes data) security selected from a group consisting of authorization , authentication , and digital signatures .

US8190610B2
CLAIM 31
. The computer system of claim 29 , wherein : the at least one processor and memory are further operable to reduce the intermediate data via a plurality of reducers ;

and the data that is not intermediate data , for a particular reducer (rule base) , includes data (includes data) that is associated with that reducer .
US6158044A
CLAIM 15
. The process of claim 1 , further comprising the steps of said object : recognizing any of said data that has become stale data ;
and providing user options for responding to said stale data , said user options being selected from a group consisting of user correction , rule base (particular reducer) d correction , and error flagging .

US6158044A
CLAIM 31
. The apparatus of claim 19 , wherein said object includes data (includes data) security selected from a group consisting of authorization , authentication , and digital signatures .




US8190610B2

Filed: 2006-10-05     Issued: 2012-05-29

MapReduce for distributed database processing

(Original Assignee) Yahoo Inc     (Current Assignee) R2 Solutions LLC ; Altaba Inc

Ali Dasdan, Hung-Chih Yang, Ruey-Lung Hsiao
EP0760500A1

Filed: 1996-08-12     Issued: 1997-03-05

Partitioning within a partition in a disk file storage system

(Original Assignee) Sun Microsystems Inc     (Current Assignee) Sun Microsystems Inc

Billy J. Fuller
US8190610B2
CLAIM 17
. A computer system (computer system) including a plurality of computing devices , the computer system configured to process data of a data set , wherein the data set comprises a plurality of data groups , the computer system comprises at least one processor and memory that are operable to perform the following operations : partitioning the data of each one of the data groups into a plurality of data partitions that each have a plurality of key-value pairs and providing each data partition to a selected one of a plurality of mapping functions that are each user-configurable to independently output a plurality of lists of values for each of a set of keys found in such map function' ;

s corresponding data partition to form corresponding intermediate data for that data group and identifiable to that data group , wherein the data of a first data group has a different schema than the data of a second data group and the data of the first data group is mapped differently than the data of the second data group so that different lists of values are output for the corresponding different intermediate data , wherein the different schema and corresponding different intermediate data have a key in common ;

and reduce the intermediate data for the data groups to at least one output data group , including processing the intermediate data for each data group in a manner that is defined to correspond to that data group so as to result in a merging of the corresponding different intermediate data based on the key in common .
EP0760500A1
CLAIM 7
Apparatus for use in a computer system (computer system) having a file system for creating and accessing a disk file partition within an existing partition , said apparatus comprising : a storage system controller for creating a raw file of a predetermined size in a data storage system , said raw file having the attributes of a partition except that said raw file may contain noncontiguous sections of storage space ;
a storage driver for accessing storage space in a data storage system based on an actual address for storage space in the storage system ;
a storage access control for translating an access request for a raw file to an actual address for a raw file and passing the actual address to said storage driver ;
and said storage driver for accessing the raw file storage space in response to the actual address for a raw file .

US8190610B2
CLAIM 18
. The computer system (computer system) of claim 17 , wherein : the at least one output data group is a plurality of output data groups .
EP0760500A1
CLAIM 7
Apparatus for use in a computer system (computer system) having a file system for creating and accessing a disk file partition within an existing partition , said apparatus comprising : a storage system controller for creating a raw file of a predetermined size in a data storage system , said raw file having the attributes of a partition except that said raw file may contain noncontiguous sections of storage space ;
a storage driver for accessing storage space in a data storage system based on an actual address for storage space in the storage system ;
a storage access control for translating an access request for a raw file to an actual address for a raw file and passing the actual address to said storage driver ;
and said storage driver for accessing the raw file storage space in response to the actual address for a raw file .

US8190610B2
CLAIM 19
. The computer system (computer system) of claim 17 , wherein : corresponding intermediate data for a data group being identifiable to the data group includes the corresponding intermediate data being stored such that metadata associated with the manner of storage is identifiable to the data group .
EP0760500A1
CLAIM 7
Apparatus for use in a computer system (computer system) having a file system for creating and accessing a disk file partition within an existing partition , said apparatus comprising : a storage system controller for creating a raw file of a predetermined size in a data storage system , said raw file having the attributes of a partition except that said raw file may contain noncontiguous sections of storage space ;
a storage driver for accessing storage space in a data storage system based on an actual address for storage space in the storage system ;
a storage access control for translating an access request for a raw file to an actual address for a raw file and passing the actual address to said storage driver ;
and said storage driver for accessing the raw file storage space in response to the actual address for a raw file .

US8190610B2
CLAIM 20
. The computer system (computer system) of claim 19 , wherein : corresponding intermediate data for a data group being identifiable to the data group includes the computer system being configured to store the corresponding intermediate data in association with data that is identifiable to the data group .
EP0760500A1
CLAIM 7
Apparatus for use in a computer system (computer system) having a file system for creating and accessing a disk file partition within an existing partition , said apparatus comprising : a storage system controller for creating a raw file of a predetermined size in a data storage system , said raw file having the attributes of a partition except that said raw file may contain noncontiguous sections of storage space ;
a storage driver for accessing storage space in a data storage system based on an actual address for storage space in the storage system ;
a storage access control for translating an access request for a raw file to an actual address for a raw file and passing the actual address to said storage driver ;
and said storage driver for accessing the raw file storage space in response to the actual address for a raw file .

US8190610B2
CLAIM 21
. The computer system (computer system) of claim 17 , wherein : the at least one processor and memory being configured to process the intermediate data for each data group in a manner that is defined to correspond to that data group includes , for each data group , the at least one processor and memory being configured to employ an iterator that corresponds to that data group .
EP0760500A1
CLAIM 7
Apparatus for use in a computer system (computer system) having a file system for creating and accessing a disk file partition within an existing partition , said apparatus comprising : a storage system controller for creating a raw file of a predetermined size in a data storage system , said raw file having the attributes of a partition except that said raw file may contain noncontiguous sections of storage space ;
a storage driver for accessing storage space in a data storage system based on an actual address for storage space in the storage system ;
a storage access control for translating an access request for a raw file to an actual address for a raw file and passing the actual address to said storage driver ;
and said storage driver for accessing the raw file storage space in response to the actual address for a raw file .

US8190610B2
CLAIM 22
. The computer system (computer system) of claim 21 , wherein : the intermediate data includes a plurality of grouped sets of key/value pairs ;

the at least one processor and memory are further operable to reduce the intermediate data via a plurality of reducers ;

for at least one of the reducers , the iterator corresponding to a particular data group , for that reducer , is configured to operate according to a different key of a different schema than the iterator corresponding to another particular data group , for that reducer .
EP0760500A1
CLAIM 7
Apparatus for use in a computer system (computer system) having a file system for creating and accessing a disk file partition within an existing partition , said apparatus comprising : a storage system controller for creating a raw file of a predetermined size in a data storage system , said raw file having the attributes of a partition except that said raw file may contain noncontiguous sections of storage space ;
a storage driver for accessing storage space in a data storage system based on an actual address for storage space in the storage system ;
a storage access control for translating an access request for a raw file to an actual address for a raw file and passing the actual address to said storage driver ;
and said storage driver for accessing the raw file storage space in response to the actual address for a raw file .

US8190610B2
CLAIM 23
. The computer system (computer system) of claim 17 , wherein : the intermediate data includes a plurality of grouped sets of key/value pairs ;

the at least one processor and memory are further operable to reduce the intermediate data via a plurality of reducers ;

and the at least one processor and memory are further operable to : partition the intermediate data into a plurality of partitions , to provide each key/value pair of the intermediate data to a separate one of the partitions ;

and provide the intermediate data of each partition to a separate one of the reducers .
EP0760500A1
CLAIM 7
Apparatus for use in a computer system (computer system) having a file system for creating and accessing a disk file partition within an existing partition , said apparatus comprising : a storage system controller for creating a raw file of a predetermined size in a data storage system , said raw file having the attributes of a partition except that said raw file may contain noncontiguous sections of storage space ;
a storage driver for accessing storage space in a data storage system based on an actual address for storage space in the storage system ;
a storage access control for translating an access request for a raw file to an actual address for a raw file and passing the actual address to said storage driver ;
and said storage driver for accessing the raw file storage space in response to the actual address for a raw file .

US8190610B2
CLAIM 24
. The computer system (computer system) of claim 17 , wherein : the intermediate data includes a plurality of grouped sets of key/value pairs ;

the at least one processor and memory are further operable to reduce the intermediate data via a plurality of reducers ;

and the at least one processor and memory are further operable to : partition the intermediate data into a plurality of partitions , to provide at least some of the key/value pairs of the intermediate data to more than one of the partitions ;

and provide the intermediate data of each partition to a separate one of the reducers .
EP0760500A1
CLAIM 7
Apparatus for use in a computer system (computer system) having a file system for creating and accessing a disk file partition within an existing partition , said apparatus comprising : a storage system controller for creating a raw file of a predetermined size in a data storage system , said raw file having the attributes of a partition except that said raw file may contain noncontiguous sections of storage space ;
a storage driver for accessing storage space in a data storage system based on an actual address for storage space in the storage system ;
a storage access control for translating an access request for a raw file to an actual address for a raw file and passing the actual address to said storage driver ;
and said storage driver for accessing the raw file storage space in response to the actual address for a raw file .

US8190610B2
CLAIM 25
. The computer system (computer system) of claim 24 , wherein : the at least one processor and memory are further operable to , in partitioning , to provide all of the key/value pairs of the intermediate data to all of the partitions .
EP0760500A1
CLAIM 7
Apparatus for use in a computer system (computer system) having a file system for creating and accessing a disk file partition within an existing partition , said apparatus comprising : a storage system controller for creating a raw file of a predetermined size in a data storage system , said raw file having the attributes of a partition except that said raw file may contain noncontiguous sections of storage space ;
a storage driver for accessing storage space in a data storage system based on an actual address for storage space in the storage system ;
a storage access control for translating an access request for a raw file to an actual address for a raw file and passing the actual address to said storage driver ;
and said storage driver for accessing the raw file storage space in response to the actual address for a raw file .

US8190610B2
CLAIM 26
. The computer system (computer system) of claim 24 , wherein : at least some of the reducers include a sort , group-by-key and combine task ;

the at least one processor and memory are further operable to generate and provide metadata for at least some of the mapping , partitioning , combining , grouping and sorting .
EP0760500A1
CLAIM 7
Apparatus for use in a computer system (computer system) having a file system for creating and accessing a disk file partition within an existing partition , said apparatus comprising : a storage system controller for creating a raw file of a predetermined size in a data storage system , said raw file having the attributes of a partition except that said raw file may contain noncontiguous sections of storage space ;
a storage driver for accessing storage space in a data storage system based on an actual address for storage space in the storage system ;
a storage access control for translating an access request for a raw file to an actual address for a raw file and passing the actual address to said storage driver ;
and said storage driver for accessing the raw file storage space in response to the actual address for a raw file .

US8190610B2
CLAIM 27
. The computer system (computer system) of claim 26 , wherein : the reducing includes processing the metadata .
EP0760500A1
CLAIM 7
Apparatus for use in a computer system (computer system) having a file system for creating and accessing a disk file partition within an existing partition , said apparatus comprising : a storage system controller for creating a raw file of a predetermined size in a data storage system , said raw file having the attributes of a partition except that said raw file may contain noncontiguous sections of storage space ;
a storage driver for accessing storage space in a data storage system based on an actual address for storage space in the storage system ;
a storage access control for translating an access request for a raw file to an actual address for a raw file and passing the actual address to said storage driver ;
and said storage driver for accessing the raw file storage space in response to the actual address for a raw file .

US8190610B2
CLAIM 28
. The computer system (computer system) of claim 27 , wherein : the at least one processor and memory being further operable to process the intermediate data for each data group in a manner that is defined to correspond to that data group includes , for each data group , the at least one processor and memory being further operable to employ an iterator that corresponds to that data group , wherein the iterator is configured to provide the associated metadata to the reducing .
EP0760500A1
CLAIM 7
Apparatus for use in a computer system (computer system) having a file system for creating and accessing a disk file partition within an existing partition , said apparatus comprising : a storage system controller for creating a raw file of a predetermined size in a data storage system , said raw file having the attributes of a partition except that said raw file may contain noncontiguous sections of storage space ;
a storage driver for accessing storage space in a data storage system based on an actual address for storage space in the storage system ;
a storage access control for translating an access request for a raw file to an actual address for a raw file and passing the actual address to said storage driver ;
and said storage driver for accessing the raw file storage space in response to the actual address for a raw file .

US8190610B2
CLAIM 29
. The computer system (computer system) of claim 17 , wherein : the intermediate data processing of the reducing further comprises processing data that is not intermediate data .
EP0760500A1
CLAIM 7
Apparatus for use in a computer system (computer system) having a file system for creating and accessing a disk file partition within an existing partition , said apparatus comprising : a storage system controller for creating a raw file of a predetermined size in a data storage system , said raw file having the attributes of a partition except that said raw file may contain noncontiguous sections of storage space ;
a storage driver for accessing storage space in a data storage system based on an actual address for storage space in the storage system ;
a storage access control for translating an access request for a raw file to an actual address for a raw file and passing the actual address to said storage driver ;
and said storage driver for accessing the raw file storage space in response to the actual address for a raw file .

US8190610B2
CLAIM 30
. The computer system (computer system) of claim 29 , wherein : the at least one processor and memory are further operable to reduce the intermediate data via a plurality of reducers ;

and the data that is not intermediate data , for a particular reducer , includes data that is associated with another reducer .
EP0760500A1
CLAIM 7
Apparatus for use in a computer system (computer system) having a file system for creating and accessing a disk file partition within an existing partition , said apparatus comprising : a storage system controller for creating a raw file of a predetermined size in a data storage system , said raw file having the attributes of a partition except that said raw file may contain noncontiguous sections of storage space ;
a storage driver for accessing storage space in a data storage system based on an actual address for storage space in the storage system ;
a storage access control for translating an access request for a raw file to an actual address for a raw file and passing the actual address to said storage driver ;
and said storage driver for accessing the raw file storage space in response to the actual address for a raw file .

US8190610B2
CLAIM 31
. The computer system (computer system) of claim 29 , wherein : the at least one processor and memory are further operable to reduce the intermediate data via a plurality of reducers ;

and the data that is not intermediate data , for a particular reducer , includes data that is associated with that reducer .
EP0760500A1
CLAIM 7
Apparatus for use in a computer system (computer system) having a file system for creating and accessing a disk file partition within an existing partition , said apparatus comprising : a storage system controller for creating a raw file of a predetermined size in a data storage system , said raw file having the attributes of a partition except that said raw file may contain noncontiguous sections of storage space ;
a storage driver for accessing storage space in a data storage system based on an actual address for storage space in the storage system ;
a storage access control for translating an access request for a raw file to an actual address for a raw file and passing the actual address to said storage driver ;
and said storage driver for accessing the raw file storage space in response to the actual address for a raw file .

US8190610B2
CLAIM 32
. The computer system (computer system) of claim 17 , wherein the at least one processor and memory are further operable to reduce by relating the data among the plurality of data groups .
EP0760500A1
CLAIM 7
Apparatus for use in a computer system (computer system) having a file system for creating and accessing a disk file partition within an existing partition , said apparatus comprising : a storage system controller for creating a raw file of a predetermined size in a data storage system , said raw file having the attributes of a partition except that said raw file may contain noncontiguous sections of storage space ;
a storage driver for accessing storage space in a data storage system based on an actual address for storage space in the storage system ;
a storage access control for translating an access request for a raw file to an actual address for a raw file and passing the actual address to said storage driver ;
and said storage driver for accessing the raw file storage space in response to the actual address for a raw file .

US8190610B2
CLAIM 33
. A map-reduce method of processing data from a plurality of groups having different schema over a computer system (computer system) , the method comprising : for a first data set having a plurality of first key-value pairs , wherein such first data set belongs to a first data group and the first key-value pairs have a first schema , partitioning the first data set into a plurality of data partitions and providing each data partition to a selected one of a plurality of mapping functions that are each user-configurable to independently output a plurality of lists of values for each of a set of keys found in such map function' ;

s corresponding data partition to form a first intermediate data set having a first set (different one) of resulting key-value pairs ;

for a second data set having a plurality of second key-value pairs , wherein such second data set belongs to a second data group and the second key-value pairs have a second schema , partitioning the first data set into a plurality of data partitions and providing each data partition to a selected one of a plurality of mapping functions to form a second intermediate data set having a second set of resulting key-value pairs differing from the first set of resulting key-value pairs , wherein the first schema differs from the second schema , wherein the first and second schema and the first and second set of resulting key-value pairs have a key in common ;

and reducing the first and the second intermediate data set together so as to form an output data set , wherein the reducing is accomplished by iterating on at least one key from the first and second intermediate data set , and the output data set has a different schema than the first and second schema , wherein the mapping and reducing operations are performed by a distributed system .
EP0760500A1
CLAIM 5
The method of Claim 4 wherein said transforming step comprises the steps of : translating the raw addresses in the access request for a raw file into actual addresses in the transformed access request passed to the disk driver to access the raw file storage space ;
detecting whether the actual addresses for the raw file indicate one contiguous storage space or multiple noncontiguous chunks of contiguous storage space ;
if said detecting step detects one contiguous storage space for the raw file , generating a disk request based on the actual address ;
and if said detecting step detects multiple noncontiguous chunks of contiguous storage space , generating multiple disk requests with each disk request based on the actual address for a different one (first set) of the chunks of contiguous storage space .

EP0760500A1
CLAIM 7
Apparatus for use in a computer system (computer system) having a file system for creating and accessing a disk file partition within an existing partition , said apparatus comprising : a storage system controller for creating a raw file of a predetermined size in a data storage system , said raw file having the attributes of a partition except that said raw file may contain noncontiguous sections of storage space ;
a storage driver for accessing storage space in a data storage system based on an actual address for storage space in the storage system ;
a storage access control for translating an access request for a raw file to an actual address for a raw file and passing the actual address to said storage driver ;
and said storage driver for accessing the raw file storage space in response to the actual address for a raw file .

US8190610B2
CLAIM 40
. A computer system (computer system) including a plurality of computing devices , the computer system configured to process data of a data set , wherein the data set comprises a plurality of data groups , the computer system comprises at least one processor and memory that are operable to perform the following operations : for a first data set having a plurality of first key-value pairs , wherein such first data set belongs to a first data group and the first key-value pairs have a first schema , partitioning the first data set into a plurality of data partitions and providing each data partition to a selected one of a plurality of mapping functions that are each user-configurable to independently output a plurality of lists of values for each of a set of keys found in such map function' ;

s corresponding data partition to form a first intermediate data set having a first set (different one) of resulting key-value pairs ;

for a second data set having a plurality of second key-value pairs , wherein such second data set belongs to a second data group and the second key-value pairs have a second schema , partitioning the first data set into a plurality of data partitions and providing each data partition to a selected one of a plurality of mapping functions to form a second intermediate data set having a second set of resulting key-value pairs differing from the first set of resulting key-value pairs , wherein the first schema differs from the second schema ;

and reducing the first and the second intermediate data set together so as to form an output data set , wherein the reducing is accomplished by iterating on at least one key from the first and second intermediate data set , and the output data set has a different schema than the first and second schema , wherein the first and second schema and the first and second set of resulting key-value pairs have a key in common .
EP0760500A1
CLAIM 5
The method of Claim 4 wherein said transforming step comprises the steps of : translating the raw addresses in the access request for a raw file into actual addresses in the transformed access request passed to the disk driver to access the raw file storage space ;
detecting whether the actual addresses for the raw file indicate one contiguous storage space or multiple noncontiguous chunks of contiguous storage space ;
if said detecting step detects one contiguous storage space for the raw file , generating a disk request based on the actual address ;
and if said detecting step detects multiple noncontiguous chunks of contiguous storage space , generating multiple disk requests with each disk request based on the actual address for a different one (first set) of the chunks of contiguous storage space .

EP0760500A1
CLAIM 7
Apparatus for use in a computer system (computer system) having a file system for creating and accessing a disk file partition within an existing partition , said apparatus comprising : a storage system controller for creating a raw file of a predetermined size in a data storage system , said raw file having the attributes of a partition except that said raw file may contain noncontiguous sections of storage space ;
a storage driver for accessing storage space in a data storage system based on an actual address for storage space in the storage system ;
a storage access control for translating an access request for a raw file to an actual address for a raw file and passing the actual address to said storage driver ;
and said storage driver for accessing the raw file storage space in response to the actual address for a raw file .

US8190610B2
CLAIM 41
. The computer system (computer system) of claim 40 , wherein the reducing is accomplished by iterating on at least one key from each of the first and second intermediate data set so that the output data set is a merging of a portion of the first and second intermediate data set .
EP0760500A1
CLAIM 7
Apparatus for use in a computer system (computer system) having a file system for creating and accessing a disk file partition within an existing partition , said apparatus comprising : a storage system controller for creating a raw file of a predetermined size in a data storage system , said raw file having the attributes of a partition except that said raw file may contain noncontiguous sections of storage space ;
a storage driver for accessing storage space in a data storage system based on an actual address for storage space in the storage system ;
a storage access control for translating an access request for a raw file to an actual address for a raw file and passing the actual address to said storage driver ;
and said storage driver for accessing the raw file storage space in response to the actual address for a raw file .

US8190610B2
CLAIM 42
. The computer system (computer system) of claim 40 , wherein the reducing is accomplished by a plurality of reducers , the d the at least one processor and memory further operable for : partitioning the first or second intermediate data set into a plurality of partitions , at least some of the key-value pairs of the first or second intermediate data being provided to more than one of the partitions ;

and providing the partitioned intermediate data set of each partition to a separate one of the reducers .
EP0760500A1
CLAIM 7
Apparatus for use in a computer system (computer system) having a file system for creating and accessing a disk file partition within an existing partition , said apparatus comprising : a storage system controller for creating a raw file of a predetermined size in a data storage system , said raw file having the attributes of a partition except that said raw file may contain noncontiguous sections of storage space ;
a storage driver for accessing storage space in a data storage system based on an actual address for storage space in the storage system ;
a storage access control for translating an access request for a raw file to an actual address for a raw file and passing the actual address to said storage driver ;
and said storage driver for accessing the raw file storage space in response to the actual address for a raw file .

US8190610B2
CLAIM 43
. The computer system (computer system) of claim 42 , wherein all of the key-value pairs of the first and second intermediate data set are provided to all of the reducers .
EP0760500A1
CLAIM 7
Apparatus for use in a computer system (computer system) having a file system for creating and accessing a disk file partition within an existing partition , said apparatus comprising : a storage system controller for creating a raw file of a predetermined size in a data storage system , said raw file having the attributes of a partition except that said raw file may contain noncontiguous sections of storage space ;
a storage driver for accessing storage space in a data storage system based on an actual address for storage space in the storage system ;
a storage access control for translating an access request for a raw file to an actual address for a raw file and passing the actual address to said storage driver ;
and said storage driver for accessing the raw file storage space in response to the actual address for a raw file .

US8190610B2
CLAIM 44
. The computer system (computer system) of claim 42 , wherein at least some of the reducers include a sort , group-by-key and combine task , the d the at least one processor and memory are further operable for : generating and providing metadata for at least some of the mapping , partitioning , combining , grouping and sorting .
EP0760500A1
CLAIM 7
Apparatus for use in a computer system (computer system) having a file system for creating and accessing a disk file partition within an existing partition , said apparatus comprising : a storage system controller for creating a raw file of a predetermined size in a data storage system , said raw file having the attributes of a partition except that said raw file may contain noncontiguous sections of storage space ;
a storage driver for accessing storage space in a data storage system based on an actual address for storage space in the storage system ;
a storage access control for translating an access request for a raw file to an actual address for a raw file and passing the actual address to said storage driver ;
and said storage driver for accessing the raw file storage space in response to the actual address for a raw file .

US8190610B2
CLAIM 45
. The computer system (computer system) of claim 44 , wherein the reducing includes processing the metadata .
EP0760500A1
CLAIM 7
Apparatus for use in a computer system (computer system) having a file system for creating and accessing a disk file partition within an existing partition , said apparatus comprising : a storage system controller for creating a raw file of a predetermined size in a data storage system , said raw file having the attributes of a partition except that said raw file may contain noncontiguous sections of storage space ;
a storage driver for accessing storage space in a data storage system based on an actual address for storage space in the storage system ;
a storage access control for translating an access request for a raw file to an actual address for a raw file and passing the actual address to said storage driver ;
and said storage driver for accessing the raw file storage space in response to the actual address for a raw file .

US8190610B2
CLAIM 46
. The computer system (computer system) of claim 45 , wherein iterating includes providing the associated metadata to the processing of the reducing step .
EP0760500A1
CLAIM 7
Apparatus for use in a computer system (computer system) having a file system for creating and accessing a disk file partition within an existing partition , said apparatus comprising : a storage system controller for creating a raw file of a predetermined size in a data storage system , said raw file having the attributes of a partition except that said raw file may contain noncontiguous sections of storage space ;
a storage driver for accessing storage space in a data storage system based on an actual address for storage space in the storage system ;
a storage access control for translating an access request for a raw file to an actual address for a raw file and passing the actual address to said storage driver ;
and said storage driver for accessing the raw file storage space in response to the actual address for a raw file .




US8190610B2

Filed: 2006-10-05     Issued: 2012-05-29

MapReduce for distributed database processing

(Original Assignee) Yahoo Inc     (Current Assignee) R2 Solutions LLC ; Altaba Inc

Ali Dasdan, Hung-Chih Yang, Ruey-Lung Hsiao
EP0829049A2

Filed: 1996-05-21     Issued: 1998-03-18

Continuously available database server having multiple groups of nodes with minimum intersecting sets of database fragment replicas

(Original Assignee) Telenor ASA     (Current Assignee) Clustra Systems Inc

Öystein TORBJÖRNSEN, Svein-Olaf Hvasshovd
US8190610B2
CLAIM 1
. A method of processing data of a data set over a distributed system , wherein the data set comprises a plurality of data groups , the method comprising : partitioning the data of each one of the data groups into a plurality of data partitions that each have a plurality of key-value pairs and providing each data partition to a selected one of a plurality of mapping functions (different cooling) that are each user-configurable to independently output a plurality of lists of values for each of a set of keys found in such map function' ;

s corresponding data partition to form corresponding intermediate data for that data group and identifiable to that data group , wherein the data of a first data group has a different schema than the data of a second data group and the data of the first data group is mapped differently than the data of the second data group so that different lists of values are output for the corresponding different intermediate data , wherein the different schema and corresponding different intermediate data have a key in common ;

and reducing the intermediate data for the data groups to at least one output data group , including processing the intermediate data for each data group in a manner that is defined to correspond to that data group , so as to result in a merging of the corresponding different intermediate data based on the key in common , wherein the mapping and reducing operations are performed by a distributed system .
EP0829049A2
CLAIM 2
. The multiprocessor computer system of claim 1 , wherein each of said groups of data processors have different power supplies and different cooling (mapping functions) systems .

US8190610B2
CLAIM 17
. A computer system (computer system, different power) including a plurality of computing devices , the computer system configured to process data of a data set , wherein the data set comprises a plurality of data groups , the computer system comprises at least one processor and memory that are operable to perform the following operations : partitioning the data of each one of the data groups into a plurality of data partitions that each have a plurality of key-value pairs and providing each data partition to a selected one of a plurality of mapping functions (different cooling) that are each user-configurable to independently output a plurality of lists of values for each of a set of keys found in such map function' ;

s corresponding data partition to form corresponding intermediate data for that data group and identifiable to that data group , wherein the data of a first data group has a different schema than the data of a second data group and the data of the first data group is mapped differently than the data of the second data group so that different lists of values are output for the corresponding different intermediate data , wherein the different schema and corresponding different intermediate data have a key in common ;

and reduce the intermediate data for the data groups to at least one output data group , including processing the intermediate data for each data group in a manner that is defined to correspond to that data group so as to result in a merging of the corresponding different intermediate data based on the key in common .
EP0829049A2
CLAIM 1
. A multiprocessor computer system (computer system) , comprising : N data processors , wherein N is a positive integer greater than three , each data processor having its own , separate , central processing unit , memory for storing database tables and other data structures , and communication channels for communication with other ones of said N data processors ;
each of said N data processors independently executing a distinct instruction data stream ;
at least a plurality of said N data processors including a communications processor for receiving transaction requests and for transmitting responses thereto ;
said N data processors being divided into at least two groups , each having at least two data processors ;
each data processor including : fragmenting means for fragmenting each of said database tables into N fragments , and for storing replicas of each fragment in different ones of said N data processors , wherein said different ones of said N data processors are in different ones of said groups of data processors such that a complete copy of each of said database tables is located within each said group of data processors and such that simultaneous failure of all data processors in either of said groups would leave a complete copy of each of said database tables in the other of said groups of data processors ;
a data dictionary that stores information indicating where each said replica replica of each fragment of said database tables is stored among said N data processors ;
said fragmenting means further adapted for changing the information stored in said data dictionary upon failure of any one of said N data processors to indicate that the replicas stored on the failed data processor are not available , and for regenerating said replicas on the failed data processor in non-failed ones , if any , of the data processors in the same group of data processors as the failed data processor ;
and said fragmenting means further adapted for dividing said database tables into F fragments F s x , for storing said F fragments in the data processors in each said group , where for a particular fragment F s x , S identifies the group of data processors in which the fragment is stored and x is an index that identifies the fragment and has a value between 0 and F-1 ;
said fragmenting means adapted to assign each fragment F s x to a data processor y in group S in accordance with the following fragment to node assignment equation : y = (x + (x div N S)»Q(S)) modulo avail where y identifies which data processor fragment F s x is assigned to , N s is the number of data processors in group S used for storing database fragments , Q(S) is an integer between 0 and N s -1 where Q(S) is a distinct value for each said group , and avail is the number of said N s data processors that have not failed .

EP0829049A2
CLAIM 2
. The multiprocessor computer system of claim 1 , wherein each of said groups of data processors have different power (computer system) supplies and different cooling (mapping functions) systems .

US8190610B2
CLAIM 18
. The computer system (computer system, different power) of claim 17 , wherein : the at least one output data group is a plurality of output data groups .
EP0829049A2
CLAIM 1
. A multiprocessor computer system (computer system) , comprising : N data processors , wherein N is a positive integer greater than three , each data processor having its own , separate , central processing unit , memory for storing database tables and other data structures , and communication channels for communication with other ones of said N data processors ;
each of said N data processors independently executing a distinct instruction data stream ;
at least a plurality of said N data processors including a communications processor for receiving transaction requests and for transmitting responses thereto ;
said N data processors being divided into at least two groups , each having at least two data processors ;
each data processor including : fragmenting means for fragmenting each of said database tables into N fragments , and for storing replicas of each fragment in different ones of said N data processors , wherein said different ones of said N data processors are in different ones of said groups of data processors such that a complete copy of each of said database tables is located within each said group of data processors and such that simultaneous failure of all data processors in either of said groups would leave a complete copy of each of said database tables in the other of said groups of data processors ;
a data dictionary that stores information indicating where each said replica replica of each fragment of said database tables is stored among said N data processors ;
said fragmenting means further adapted for changing the information stored in said data dictionary upon failure of any one of said N data processors to indicate that the replicas stored on the failed data processor are not available , and for regenerating said replicas on the failed data processor in non-failed ones , if any , of the data processors in the same group of data processors as the failed data processor ;
and said fragmenting means further adapted for dividing said database tables into F fragments F s x , for storing said F fragments in the data processors in each said group , where for a particular fragment F s x , S identifies the group of data processors in which the fragment is stored and x is an index that identifies the fragment and has a value between 0 and F-1 ;
said fragmenting means adapted to assign each fragment F s x to a data processor y in group S in accordance with the following fragment to node assignment equation : y = (x + (x div N S)»Q(S)) modulo avail where y identifies which data processor fragment F s x is assigned to , N s is the number of data processors in group S used for storing database fragments , Q(S) is an integer between 0 and N s -1 where Q(S) is a distinct value for each said group , and avail is the number of said N s data processors that have not failed .

EP0829049A2
CLAIM 2
. The multiprocessor computer system of claim 1 , wherein each of said groups of data processors have different power (computer system) supplies and different cooling systems .

US8190610B2
CLAIM 19
. The computer system (computer system, different power) of claim 17 , wherein : corresponding intermediate data for a data group being identifiable to the data group includes the corresponding intermediate data being stored such that metadata associated with the manner of storage is identifiable to the data group .
EP0829049A2
CLAIM 1
. A multiprocessor computer system (computer system) , comprising : N data processors , wherein N is a positive integer greater than three , each data processor having its own , separate , central processing unit , memory for storing database tables and other data structures , and communication channels for communication with other ones of said N data processors ;
each of said N data processors independently executing a distinct instruction data stream ;
at least a plurality of said N data processors including a communications processor for receiving transaction requests and for transmitting responses thereto ;
said N data processors being divided into at least two groups , each having at least two data processors ;
each data processor including : fragmenting means for fragmenting each of said database tables into N fragments , and for storing replicas of each fragment in different ones of said N data processors , wherein said different ones of said N data processors are in different ones of said groups of data processors such that a complete copy of each of said database tables is located within each said group of data processors and such that simultaneous failure of all data processors in either of said groups would leave a complete copy of each of said database tables in the other of said groups of data processors ;
a data dictionary that stores information indicating where each said replica replica of each fragment of said database tables is stored among said N data processors ;
said fragmenting means further adapted for changing the information stored in said data dictionary upon failure of any one of said N data processors to indicate that the replicas stored on the failed data processor are not available , and for regenerating said replicas on the failed data processor in non-failed ones , if any , of the data processors in the same group of data processors as the failed data processor ;
and said fragmenting means further adapted for dividing said database tables into F fragments F s x , for storing said F fragments in the data processors in each said group , where for a particular fragment F s x , S identifies the group of data processors in which the fragment is stored and x is an index that identifies the fragment and has a value between 0 and F-1 ;
said fragmenting means adapted to assign each fragment F s x to a data processor y in group S in accordance with the following fragment to node assignment equation : y = (x + (x div N S)»Q(S)) modulo avail where y identifies which data processor fragment F s x is assigned to , N s is the number of data processors in group S used for storing database fragments , Q(S) is an integer between 0 and N s -1 where Q(S) is a distinct value for each said group , and avail is the number of said N s data processors that have not failed .

EP0829049A2
CLAIM 2
. The multiprocessor computer system of claim 1 , wherein each of said groups of data processors have different power (computer system) supplies and different cooling systems .

US8190610B2
CLAIM 20
. The computer system (computer system, different power) of claim 19 , wherein : corresponding intermediate data for a data group being identifiable to the data group includes the computer system being configured to store the corresponding intermediate data in association with data that is identifiable to the data group .
EP0829049A2
CLAIM 1
. A multiprocessor computer system (computer system) , comprising : N data processors , wherein N is a positive integer greater than three , each data processor having its own , separate , central processing unit , memory for storing database tables and other data structures , and communication channels for communication with other ones of said N data processors ;
each of said N data processors independently executing a distinct instruction data stream ;
at least a plurality of said N data processors including a communications processor for receiving transaction requests and for transmitting responses thereto ;
said N data processors being divided into at least two groups , each having at least two data processors ;
each data processor including : fragmenting means for fragmenting each of said database tables into N fragments , and for storing replicas of each fragment in different ones of said N data processors , wherein said different ones of said N data processors are in different ones of said groups of data processors such that a complete copy of each of said database tables is located within each said group of data processors and such that simultaneous failure of all data processors in either of said groups would leave a complete copy of each of said database tables in the other of said groups of data processors ;
a data dictionary that stores information indicating where each said replica replica of each fragment of said database tables is stored among said N data processors ;
said fragmenting means further adapted for changing the information stored in said data dictionary upon failure of any one of said N data processors to indicate that the replicas stored on the failed data processor are not available , and for regenerating said replicas on the failed data processor in non-failed ones , if any , of the data processors in the same group of data processors as the failed data processor ;
and said fragmenting means further adapted for dividing said database tables into F fragments F s x , for storing said F fragments in the data processors in each said group , where for a particular fragment F s x , S identifies the group of data processors in which the fragment is stored and x is an index that identifies the fragment and has a value between 0 and F-1 ;
said fragmenting means adapted to assign each fragment F s x to a data processor y in group S in accordance with the following fragment to node assignment equation : y = (x + (x div N S)»Q(S)) modulo avail where y identifies which data processor fragment F s x is assigned to , N s is the number of data processors in group S used for storing database fragments , Q(S) is an integer between 0 and N s -1 where Q(S) is a distinct value for each said group , and avail is the number of said N s data processors that have not failed .

EP0829049A2
CLAIM 2
. The multiprocessor computer system of claim 1 , wherein each of said groups of data processors have different power (computer system) supplies and different cooling systems .

US8190610B2
CLAIM 21
. The computer system (computer system, different power) of claim 17 , wherein : the at least one processor and memory being configured to process the intermediate data for each data group in a manner that is defined to correspond to that data group includes , for each data group , the at least one processor and memory being configured to employ an iterator that corresponds to that data group .
EP0829049A2
CLAIM 1
. A multiprocessor computer system (computer system) , comprising : N data processors , wherein N is a positive integer greater than three , each data processor having its own , separate , central processing unit , memory for storing database tables and other data structures , and communication channels for communication with other ones of said N data processors ;
each of said N data processors independently executing a distinct instruction data stream ;
at least a plurality of said N data processors including a communications processor for receiving transaction requests and for transmitting responses thereto ;
said N data processors being divided into at least two groups , each having at least two data processors ;
each data processor including : fragmenting means for fragmenting each of said database tables into N fragments , and for storing replicas of each fragment in different ones of said N data processors , wherein said different ones of said N data processors are in different ones of said groups of data processors such that a complete copy of each of said database tables is located within each said group of data processors and such that simultaneous failure of all data processors in either of said groups would leave a complete copy of each of said database tables in the other of said groups of data processors ;
a data dictionary that stores information indicating where each said replica replica of each fragment of said database tables is stored among said N data processors ;
said fragmenting means further adapted for changing the information stored in said data dictionary upon failure of any one of said N data processors to indicate that the replicas stored on the failed data processor are not available , and for regenerating said replicas on the failed data processor in non-failed ones , if any , of the data processors in the same group of data processors as the failed data processor ;
and said fragmenting means further adapted for dividing said database tables into F fragments F s x , for storing said F fragments in the data processors in each said group , where for a particular fragment F s x , S identifies the group of data processors in which the fragment is stored and x is an index that identifies the fragment and has a value between 0 and F-1 ;
said fragmenting means adapted to assign each fragment F s x to a data processor y in group S in accordance with the following fragment to node assignment equation : y = (x + (x div N S)»Q(S)) modulo avail where y identifies which data processor fragment F s x is assigned to , N s is the number of data processors in group S used for storing database fragments , Q(S) is an integer between 0 and N s -1 where Q(S) is a distinct value for each said group , and avail is the number of said N s data processors that have not failed .

EP0829049A2
CLAIM 2
. The multiprocessor computer system of claim 1 , wherein each of said groups of data processors have different power (computer system) supplies and different cooling systems .

US8190610B2
CLAIM 22
. The computer system (computer system, different power) of claim 21 , wherein : the intermediate data includes a plurality of grouped sets of key/value pairs ;

the at least one processor and memory are further operable to reduce the intermediate data via a plurality of reducers ;

for at least one of the reducers , the iterator corresponding to a particular data group , for that reducer , is configured to operate according to a different key of a different schema than the iterator corresponding to another particular data group , for that reducer .
EP0829049A2
CLAIM 1
. A multiprocessor computer system (computer system) , comprising : N data processors , wherein N is a positive integer greater than three , each data processor having its own , separate , central processing unit , memory for storing database tables and other data structures , and communication channels for communication with other ones of said N data processors ;
each of said N data processors independently executing a distinct instruction data stream ;
at least a plurality of said N data processors including a communications processor for receiving transaction requests and for transmitting responses thereto ;
said N data processors being divided into at least two groups , each having at least two data processors ;
each data processor including : fragmenting means for fragmenting each of said database tables into N fragments , and for storing replicas of each fragment in different ones of said N data processors , wherein said different ones of said N data processors are in different ones of said groups of data processors such that a complete copy of each of said database tables is located within each said group of data processors and such that simultaneous failure of all data processors in either of said groups would leave a complete copy of each of said database tables in the other of said groups of data processors ;
a data dictionary that stores information indicating where each said replica replica of each fragment of said database tables is stored among said N data processors ;
said fragmenting means further adapted for changing the information stored in said data dictionary upon failure of any one of said N data processors to indicate that the replicas stored on the failed data processor are not available , and for regenerating said replicas on the failed data processor in non-failed ones , if any , of the data processors in the same group of data processors as the failed data processor ;
and said fragmenting means further adapted for dividing said database tables into F fragments F s x , for storing said F fragments in the data processors in each said group , where for a particular fragment F s x , S identifies the group of data processors in which the fragment is stored and x is an index that identifies the fragment and has a value between 0 and F-1 ;
said fragmenting means adapted to assign each fragment F s x to a data processor y in group S in accordance with the following fragment to node assignment equation : y = (x + (x div N S)»Q(S)) modulo avail where y identifies which data processor fragment F s x is assigned to , N s is the number of data processors in group S used for storing database fragments , Q(S) is an integer between 0 and N s -1 where Q(S) is a distinct value for each said group , and avail is the number of said N s data processors that have not failed .

EP0829049A2
CLAIM 2
. The multiprocessor computer system of claim 1 , wherein each of said groups of data processors have different power (computer system) supplies and different cooling systems .

US8190610B2
CLAIM 23
. The computer system (computer system, different power) of claim 17 , wherein : the intermediate data includes a plurality of grouped sets of key/value pairs ;

the at least one processor and memory are further operable to reduce the intermediate data via a plurality of reducers ;

and the at least one processor and memory are further operable to : partition the intermediate data into a plurality of partitions , to provide each key/value pair of the intermediate data to a separate one of the partitions ;

and provide the intermediate data of each partition to a separate one of the reducers .
EP0829049A2
CLAIM 1
. A multiprocessor computer system (computer system) , comprising : N data processors , wherein N is a positive integer greater than three , each data processor having its own , separate , central processing unit , memory for storing database tables and other data structures , and communication channels for communication with other ones of said N data processors ;
each of said N data processors independently executing a distinct instruction data stream ;
at least a plurality of said N data processors including a communications processor for receiving transaction requests and for transmitting responses thereto ;
said N data processors being divided into at least two groups , each having at least two data processors ;
each data processor including : fragmenting means for fragmenting each of said database tables into N fragments , and for storing replicas of each fragment in different ones of said N data processors , wherein said different ones of said N data processors are in different ones of said groups of data processors such that a complete copy of each of said database tables is located within each said group of data processors and such that simultaneous failure of all data processors in either of said groups would leave a complete copy of each of said database tables in the other of said groups of data processors ;
a data dictionary that stores information indicating where each said replica replica of each fragment of said database tables is stored among said N data processors ;
said fragmenting means further adapted for changing the information stored in said data dictionary upon failure of any one of said N data processors to indicate that the replicas stored on the failed data processor are not available , and for regenerating said replicas on the failed data processor in non-failed ones , if any , of the data processors in the same group of data processors as the failed data processor ;
and said fragmenting means further adapted for dividing said database tables into F fragments F s x , for storing said F fragments in the data processors in each said group , where for a particular fragment F s x , S identifies the group of data processors in which the fragment is stored and x is an index that identifies the fragment and has a value between 0 and F-1 ;
said fragmenting means adapted to assign each fragment F s x to a data processor y in group S in accordance with the following fragment to node assignment equation : y = (x + (x div N S)»Q(S)) modulo avail where y identifies which data processor fragment F s x is assigned to , N s is the number of data processors in group S used for storing database fragments , Q(S) is an integer between 0 and N s -1 where Q(S) is a distinct value for each said group , and avail is the number of said N s data processors that have not failed .

EP0829049A2
CLAIM 2
. The multiprocessor computer system of claim 1 , wherein each of said groups of data processors have different power (computer system) supplies and different cooling systems .

US8190610B2
CLAIM 24
. The computer system (computer system, different power) of claim 17 , wherein : the intermediate data includes a plurality of grouped sets of key/value pairs ;

the at least one processor and memory are further operable to reduce the intermediate data via a plurality of reducers ;

and the at least one processor and memory are further operable to : partition the intermediate data into a plurality of partitions , to provide at least some of the key/value pairs of the intermediate data to more than one of the partitions ;

and provide the intermediate data of each partition to a separate one of the reducers .
EP0829049A2
CLAIM 1
. A multiprocessor computer system (computer system) , comprising : N data processors , wherein N is a positive integer greater than three , each data processor having its own , separate , central processing unit , memory for storing database tables and other data structures , and communication channels for communication with other ones of said N data processors ;
each of said N data processors independently executing a distinct instruction data stream ;
at least a plurality of said N data processors including a communications processor for receiving transaction requests and for transmitting responses thereto ;
said N data processors being divided into at least two groups , each having at least two data processors ;
each data processor including : fragmenting means for fragmenting each of said database tables into N fragments , and for storing replicas of each fragment in different ones of said N data processors , wherein said different ones of said N data processors are in different ones of said groups of data processors such that a complete copy of each of said database tables is located within each said group of data processors and such that simultaneous failure of all data processors in either of said groups would leave a complete copy of each of said database tables in the other of said groups of data processors ;
a data dictionary that stores information indicating where each said replica replica of each fragment of said database tables is stored among said N data processors ;
said fragmenting means further adapted for changing the information stored in said data dictionary upon failure of any one of said N data processors to indicate that the replicas stored on the failed data processor are not available , and for regenerating said replicas on the failed data processor in non-failed ones , if any , of the data processors in the same group of data processors as the failed data processor ;
and said fragmenting means further adapted for dividing said database tables into F fragments F s x , for storing said F fragments in the data processors in each said group , where for a particular fragment F s x , S identifies the group of data processors in which the fragment is stored and x is an index that identifies the fragment and has a value between 0 and F-1 ;
said fragmenting means adapted to assign each fragment F s x to a data processor y in group S in accordance with the following fragment to node assignment equation : y = (x + (x div N S)»Q(S)) modulo avail where y identifies which data processor fragment F s x is assigned to , N s is the number of data processors in group S used for storing database fragments , Q(S) is an integer between 0 and N s -1 where Q(S) is a distinct value for each said group , and avail is the number of said N s data processors that have not failed .

EP0829049A2
CLAIM 2
. The multiprocessor computer system of claim 1 , wherein each of said groups of data processors have different power (computer system) supplies and different cooling systems .

US8190610B2
CLAIM 25
. The computer system (computer system, different power) of claim 24 , wherein : the at least one processor and memory are further operable to , in partitioning , to provide all of the key/value pairs of the intermediate data to all of the partitions .
EP0829049A2
CLAIM 1
. A multiprocessor computer system (computer system) , comprising : N data processors , wherein N is a positive integer greater than three , each data processor having its own , separate , central processing unit , memory for storing database tables and other data structures , and communication channels for communication with other ones of said N data processors ;
each of said N data processors independently executing a distinct instruction data stream ;
at least a plurality of said N data processors including a communications processor for receiving transaction requests and for transmitting responses thereto ;
said N data processors being divided into at least two groups , each having at least two data processors ;
each data processor including : fragmenting means for fragmenting each of said database tables into N fragments , and for storing replicas of each fragment in different ones of said N data processors , wherein said different ones of said N data processors are in different ones of said groups of data processors such that a complete copy of each of said database tables is located within each said group of data processors and such that simultaneous failure of all data processors in either of said groups would leave a complete copy of each of said database tables in the other of said groups of data processors ;
a data dictionary that stores information indicating where each said replica replica of each fragment of said database tables is stored among said N data processors ;
said fragmenting means further adapted for changing the information stored in said data dictionary upon failure of any one of said N data processors to indicate that the replicas stored on the failed data processor are not available , and for regenerating said replicas on the failed data processor in non-failed ones , if any , of the data processors in the same group of data processors as the failed data processor ;
and said fragmenting means further adapted for dividing said database tables into F fragments F s x , for storing said F fragments in the data processors in each said group , where for a particular fragment F s x , S identifies the group of data processors in which the fragment is stored and x is an index that identifies the fragment and has a value between 0 and F-1 ;
said fragmenting means adapted to assign each fragment F s x to a data processor y in group S in accordance with the following fragment to node assignment equation : y = (x + (x div N S)»Q(S)) modulo avail where y identifies which data processor fragment F s x is assigned to , N s is the number of data processors in group S used for storing database fragments , Q(S) is an integer between 0 and N s -1 where Q(S) is a distinct value for each said group , and avail is the number of said N s data processors that have not failed .

EP0829049A2
CLAIM 2
. The multiprocessor computer system of claim 1 , wherein each of said groups of data processors have different power (computer system) supplies and different cooling systems .

US8190610B2
CLAIM 26
. The computer system (computer system, different power) of claim 24 , wherein : at least some of the reducers include a sort , group-by-key and combine task ;

the at least one processor and memory are further operable to generate and provide metadata for at least some of the mapping , partitioning , combining , grouping and sorting .
EP0829049A2
CLAIM 1
. A multiprocessor computer system (computer system) , comprising : N data processors , wherein N is a positive integer greater than three , each data processor having its own , separate , central processing unit , memory for storing database tables and other data structures , and communication channels for communication with other ones of said N data processors ;
each of said N data processors independently executing a distinct instruction data stream ;
at least a plurality of said N data processors including a communications processor for receiving transaction requests and for transmitting responses thereto ;
said N data processors being divided into at least two groups , each having at least two data processors ;
each data processor including : fragmenting means for fragmenting each of said database tables into N fragments , and for storing replicas of each fragment in different ones of said N data processors , wherein said different ones of said N data processors are in different ones of said groups of data processors such that a complete copy of each of said database tables is located within each said group of data processors and such that simultaneous failure of all data processors in either of said groups would leave a complete copy of each of said database tables in the other of said groups of data processors ;
a data dictionary that stores information indicating where each said replica replica of each fragment of said database tables is stored among said N data processors ;
said fragmenting means further adapted for changing the information stored in said data dictionary upon failure of any one of said N data processors to indicate that the replicas stored on the failed data processor are not available , and for regenerating said replicas on the failed data processor in non-failed ones , if any , of the data processors in the same group of data processors as the failed data processor ;
and said fragmenting means further adapted for dividing said database tables into F fragments F s x , for storing said F fragments in the data processors in each said group , where for a particular fragment F s x , S identifies the group of data processors in which the fragment is stored and x is an index that identifies the fragment and has a value between 0 and F-1 ;
said fragmenting means adapted to assign each fragment F s x to a data processor y in group S in accordance with the following fragment to node assignment equation : y = (x + (x div N S)»Q(S)) modulo avail where y identifies which data processor fragment F s x is assigned to , N s is the number of data processors in group S used for storing database fragments , Q(S) is an integer between 0 and N s -1 where Q(S) is a distinct value for each said group , and avail is the number of said N s data processors that have not failed .

EP0829049A2
CLAIM 2
. The multiprocessor computer system of claim 1 , wherein each of said groups of data processors have different power (computer system) supplies and different cooling systems .

US8190610B2
CLAIM 27
. The computer system (computer system, different power) of claim 26 , wherein : the reducing includes processing the metadata .
EP0829049A2
CLAIM 1
. A multiprocessor computer system (computer system) , comprising : N data processors , wherein N is a positive integer greater than three , each data processor having its own , separate , central processing unit , memory for storing database tables and other data structures , and communication channels for communication with other ones of said N data processors ;
each of said N data processors independently executing a distinct instruction data stream ;
at least a plurality of said N data processors including a communications processor for receiving transaction requests and for transmitting responses thereto ;
said N data processors being divided into at least two groups , each having at least two data processors ;
each data processor including : fragmenting means for fragmenting each of said database tables into N fragments , and for storing replicas of each fragment in different ones of said N data processors , wherein said different ones of said N data processors are in different ones of said groups of data processors such that a complete copy of each of said database tables is located within each said group of data processors and such that simultaneous failure of all data processors in either of said groups would leave a complete copy of each of said database tables in the other of said groups of data processors ;
a data dictionary that stores information indicating where each said replica replica of each fragment of said database tables is stored among said N data processors ;
said fragmenting means further adapted for changing the information stored in said data dictionary upon failure of any one of said N data processors to indicate that the replicas stored on the failed data processor are not available , and for regenerating said replicas on the failed data processor in non-failed ones , if any , of the data processors in the same group of data processors as the failed data processor ;
and said fragmenting means further adapted for dividing said database tables into F fragments F s x , for storing said F fragments in the data processors in each said group , where for a particular fragment F s x , S identifies the group of data processors in which the fragment is stored and x is an index that identifies the fragment and has a value between 0 and F-1 ;
said fragmenting means adapted to assign each fragment F s x to a data processor y in group S in accordance with the following fragment to node assignment equation : y = (x + (x div N S)»Q(S)) modulo avail where y identifies which data processor fragment F s x is assigned to , N s is the number of data processors in group S used for storing database fragments , Q(S) is an integer between 0 and N s -1 where Q(S) is a distinct value for each said group , and avail is the number of said N s data processors that have not failed .

EP0829049A2
CLAIM 2
. The multiprocessor computer system of claim 1 , wherein each of said groups of data processors have different power (computer system) supplies and different cooling systems .

US8190610B2
CLAIM 28
. The computer system (computer system, different power) of claim 27 , wherein : the at least one processor and memory being further operable to process the intermediate data for each data group in a manner that is defined to correspond to that data group includes , for each data group , the at least one processor and memory being further operable to employ an iterator that corresponds to that data group , wherein the iterator is configured to provide the associated metadata to the reducing .
EP0829049A2
CLAIM 1
. A multiprocessor computer system (computer system) , comprising : N data processors , wherein N is a positive integer greater than three , each data processor having its own , separate , central processing unit , memory for storing database tables and other data structures , and communication channels for communication with other ones of said N data processors ;
each of said N data processors independently executing a distinct instruction data stream ;
at least a plurality of said N data processors including a communications processor for receiving transaction requests and for transmitting responses thereto ;
said N data processors being divided into at least two groups , each having at least two data processors ;
each data processor including : fragmenting means for fragmenting each of said database tables into N fragments , and for storing replicas of each fragment in different ones of said N data processors , wherein said different ones of said N data processors are in different ones of said groups of data processors such that a complete copy of each of said database tables is located within each said group of data processors and such that simultaneous failure of all data processors in either of said groups would leave a complete copy of each of said database tables in the other of said groups of data processors ;
a data dictionary that stores information indicating where each said replica replica of each fragment of said database tables is stored among said N data processors ;
said fragmenting means further adapted for changing the information stored in said data dictionary upon failure of any one of said N data processors to indicate that the replicas stored on the failed data processor are not available , and for regenerating said replicas on the failed data processor in non-failed ones , if any , of the data processors in the same group of data processors as the failed data processor ;
and said fragmenting means further adapted for dividing said database tables into F fragments F s x , for storing said F fragments in the data processors in each said group , where for a particular fragment F s x , S identifies the group of data processors in which the fragment is stored and x is an index that identifies the fragment and has a value between 0 and F-1 ;
said fragmenting means adapted to assign each fragment F s x to a data processor y in group S in accordance with the following fragment to node assignment equation : y = (x + (x div N S)»Q(S)) modulo avail where y identifies which data processor fragment F s x is assigned to , N s is the number of data processors in group S used for storing database fragments , Q(S) is an integer between 0 and N s -1 where Q(S) is a distinct value for each said group , and avail is the number of said N s data processors that have not failed .

EP0829049A2
CLAIM 2
. The multiprocessor computer system of claim 1 , wherein each of said groups of data processors have different power (computer system) supplies and different cooling systems .

US8190610B2
CLAIM 29
. The computer system (computer system, different power) of claim 17 , wherein : the intermediate data processing of the reducing further comprises processing data that is not intermediate data .
EP0829049A2
CLAIM 1
. A multiprocessor computer system (computer system) , comprising : N data processors , wherein N is a positive integer greater than three , each data processor having its own , separate , central processing unit , memory for storing database tables and other data structures , and communication channels for communication with other ones of said N data processors ;
each of said N data processors independently executing a distinct instruction data stream ;
at least a plurality of said N data processors including a communications processor for receiving transaction requests and for transmitting responses thereto ;
said N data processors being divided into at least two groups , each having at least two data processors ;
each data processor including : fragmenting means for fragmenting each of said database tables into N fragments , and for storing replicas of each fragment in different ones of said N data processors , wherein said different ones of said N data processors are in different ones of said groups of data processors such that a complete copy of each of said database tables is located within each said group of data processors and such that simultaneous failure of all data processors in either of said groups would leave a complete copy of each of said database tables in the other of said groups of data processors ;
a data dictionary that stores information indicating where each said replica replica of each fragment of said database tables is stored among said N data processors ;
said fragmenting means further adapted for changing the information stored in said data dictionary upon failure of any one of said N data processors to indicate that the replicas stored on the failed data processor are not available , and for regenerating said replicas on the failed data processor in non-failed ones , if any , of the data processors in the same group of data processors as the failed data processor ;
and said fragmenting means further adapted for dividing said database tables into F fragments F s x , for storing said F fragments in the data processors in each said group , where for a particular fragment F s x , S identifies the group of data processors in which the fragment is stored and x is an index that identifies the fragment and has a value between 0 and F-1 ;
said fragmenting means adapted to assign each fragment F s x to a data processor y in group S in accordance with the following fragment to node assignment equation : y = (x + (x div N S)»Q(S)) modulo avail where y identifies which data processor fragment F s x is assigned to , N s is the number of data processors in group S used for storing database fragments , Q(S) is an integer between 0 and N s -1 where Q(S) is a distinct value for each said group , and avail is the number of said N s data processors that have not failed .

EP0829049A2
CLAIM 2
. The multiprocessor computer system of claim 1 , wherein each of said groups of data processors have different power (computer system) supplies and different cooling systems .

US8190610B2
CLAIM 30
. The computer system (computer system, different power) of claim 29 , wherein : the at least one processor and memory are further operable to reduce the intermediate data via a plurality of reducers ;

and the data that is not intermediate data , for a particular reducer , includes data that is associated with another reducer .
EP0829049A2
CLAIM 1
. A multiprocessor computer system (computer system) , comprising : N data processors , wherein N is a positive integer greater than three , each data processor having its own , separate , central processing unit , memory for storing database tables and other data structures , and communication channels for communication with other ones of said N data processors ;
each of said N data processors independently executing a distinct instruction data stream ;
at least a plurality of said N data processors including a communications processor for receiving transaction requests and for transmitting responses thereto ;
said N data processors being divided into at least two groups , each having at least two data processors ;
each data processor including : fragmenting means for fragmenting each of said database tables into N fragments , and for storing replicas of each fragment in different ones of said N data processors , wherein said different ones of said N data processors are in different ones of said groups of data processors such that a complete copy of each of said database tables is located within each said group of data processors and such that simultaneous failure of all data processors in either of said groups would leave a complete copy of each of said database tables in the other of said groups of data processors ;
a data dictionary that stores information indicating where each said replica replica of each fragment of said database tables is stored among said N data processors ;
said fragmenting means further adapted for changing the information stored in said data dictionary upon failure of any one of said N data processors to indicate that the replicas stored on the failed data processor are not available , and for regenerating said replicas on the failed data processor in non-failed ones , if any , of the data processors in the same group of data processors as the failed data processor ;
and said fragmenting means further adapted for dividing said database tables into F fragments F s x , for storing said F fragments in the data processors in each said group , where for a particular fragment F s x , S identifies the group of data processors in which the fragment is stored and x is an index that identifies the fragment and has a value between 0 and F-1 ;
said fragmenting means adapted to assign each fragment F s x to a data processor y in group S in accordance with the following fragment to node assignment equation : y = (x + (x div N S)»Q(S)) modulo avail where y identifies which data processor fragment F s x is assigned to , N s is the number of data processors in group S used for storing database fragments , Q(S) is an integer between 0 and N s -1 where Q(S) is a distinct value for each said group , and avail is the number of said N s data processors that have not failed .

EP0829049A2
CLAIM 2
. The multiprocessor computer system of claim 1 , wherein each of said groups of data processors have different power (computer system) supplies and different cooling systems .

US8190610B2
CLAIM 31
. The computer system (computer system, different power) of claim 29 , wherein : the at least one processor and memory are further operable to reduce the intermediate data via a plurality of reducers ;

and the data that is not intermediate data , for a particular reducer , includes data that is associated with that reducer .
EP0829049A2
CLAIM 1
. A multiprocessor computer system (computer system) , comprising : N data processors , wherein N is a positive integer greater than three , each data processor having its own , separate , central processing unit , memory for storing database tables and other data structures , and communication channels for communication with other ones of said N data processors ;
each of said N data processors independently executing a distinct instruction data stream ;
at least a plurality of said N data processors including a communications processor for receiving transaction requests and for transmitting responses thereto ;
said N data processors being divided into at least two groups , each having at least two data processors ;
each data processor including : fragmenting means for fragmenting each of said database tables into N fragments , and for storing replicas of each fragment in different ones of said N data processors , wherein said different ones of said N data processors are in different ones of said groups of data processors such that a complete copy of each of said database tables is located within each said group of data processors and such that simultaneous failure of all data processors in either of said groups would leave a complete copy of each of said database tables in the other of said groups of data processors ;
a data dictionary that stores information indicating where each said replica replica of each fragment of said database tables is stored among said N data processors ;
said fragmenting means further adapted for changing the information stored in said data dictionary upon failure of any one of said N data processors to indicate that the replicas stored on the failed data processor are not available , and for regenerating said replicas on the failed data processor in non-failed ones , if any , of the data processors in the same group of data processors as the failed data processor ;
and said fragmenting means further adapted for dividing said database tables into F fragments F s x , for storing said F fragments in the data processors in each said group , where for a particular fragment F s x , S identifies the group of data processors in which the fragment is stored and x is an index that identifies the fragment and has a value between 0 and F-1 ;
said fragmenting means adapted to assign each fragment F s x to a data processor y in group S in accordance with the following fragment to node assignment equation : y = (x + (x div N S)»Q(S)) modulo avail where y identifies which data processor fragment F s x is assigned to , N s is the number of data processors in group S used for storing database fragments , Q(S) is an integer between 0 and N s -1 where Q(S) is a distinct value for each said group , and avail is the number of said N s data processors that have not failed .

EP0829049A2
CLAIM 2
. The multiprocessor computer system of claim 1 , wherein each of said groups of data processors have different power (computer system) supplies and different cooling systems .

US8190610B2
CLAIM 32
. The computer system (computer system, different power) of claim 17 , wherein the at least one processor and memory are further operable to reduce by relating the data among the plurality of data groups .
EP0829049A2
CLAIM 1
. A multiprocessor computer system (computer system) , comprising : N data processors , wherein N is a positive integer greater than three , each data processor having its own , separate , central processing unit , memory for storing database tables and other data structures , and communication channels for communication with other ones of said N data processors ;
each of said N data processors independently executing a distinct instruction data stream ;
at least a plurality of said N data processors including a communications processor for receiving transaction requests and for transmitting responses thereto ;
said N data processors being divided into at least two groups , each having at least two data processors ;
each data processor including : fragmenting means for fragmenting each of said database tables into N fragments , and for storing replicas of each fragment in different ones of said N data processors , wherein said different ones of said N data processors are in different ones of said groups of data processors such that a complete copy of each of said database tables is located within each said group of data processors and such that simultaneous failure of all data processors in either of said groups would leave a complete copy of each of said database tables in the other of said groups of data processors ;
a data dictionary that stores information indicating where each said replica replica of each fragment of said database tables is stored among said N data processors ;
said fragmenting means further adapted for changing the information stored in said data dictionary upon failure of any one of said N data processors to indicate that the replicas stored on the failed data processor are not available , and for regenerating said replicas on the failed data processor in non-failed ones , if any , of the data processors in the same group of data processors as the failed data processor ;
and said fragmenting means further adapted for dividing said database tables into F fragments F s x , for storing said F fragments in the data processors in each said group , where for a particular fragment F s x , S identifies the group of data processors in which the fragment is stored and x is an index that identifies the fragment and has a value between 0 and F-1 ;
said fragmenting means adapted to assign each fragment F s x to a data processor y in group S in accordance with the following fragment to node assignment equation : y = (x + (x div N S)»Q(S)) modulo avail where y identifies which data processor fragment F s x is assigned to , N s is the number of data processors in group S used for storing database fragments , Q(S) is an integer between 0 and N s -1 where Q(S) is a distinct value for each said group , and avail is the number of said N s data processors that have not failed .

EP0829049A2
CLAIM 2
. The multiprocessor computer system of claim 1 , wherein each of said groups of data processors have different power (computer system) supplies and different cooling systems .

US8190610B2
CLAIM 33
. A map-reduce method of processing data from a plurality of groups having different schema over a computer system (computer system, different power) , the method comprising : for a first data set having a plurality of first key-value pairs , wherein such first data set belongs to a first data group and the first key-value pairs have a first schema , partitioning the first data set into a plurality of data partitions and providing each data partition to a selected one of a plurality of mapping functions (different cooling) that are each user-configurable to independently output a plurality of lists of values for each of a set of keys found in such map function' ;

s corresponding data partition to form a first intermediate data set having a first set (different one) of resulting key-value pairs ;

for a second data set having a plurality of second key-value pairs , wherein such second data set belongs to a second data group and the second key-value pairs have a second schema , partitioning the first data set into a plurality of data partitions and providing each data partition to a selected one of a plurality of mapping functions to form a second intermediate data set having a second set of resulting key-value pairs differing from the first set of resulting key-value pairs , wherein the first schema differs from the second schema , wherein the first and second schema and the first and second set of resulting key-value pairs have a key in common ;

and reducing the first and the second intermediate data set together so as to form an output data set , wherein the reducing is accomplished by iterating on at least one key from the first and second intermediate data set , and the output data set has a different schema than the first and second schema , wherein the mapping and reducing operations are performed by a distributed system .
EP0829049A2
CLAIM 1
. A multiprocessor computer system (computer system) , comprising : N data processors , wherein N is a positive integer greater than three , each data processor having its own , separate , central processing unit , memory for storing database tables and other data structures , and communication channels for communication with other ones of said N data processors ;
each of said N data processors independently executing a distinct instruction data stream ;
at least a plurality of said N data processors including a communications processor for receiving transaction requests and for transmitting responses thereto ;
said N data processors being divided into at least two groups , each having at least two data processors ;
each data processor including : fragmenting means for fragmenting each of said database tables into N fragments , and for storing replicas of each fragment in different one (first set) s of said N data processors , wherein said different ones of said N data processors are in different ones of said groups of data processors such that a complete copy of each of said database tables is located within each said group of data processors and such that simultaneous failure of all data processors in either of said groups would leave a complete copy of each of said database tables in the other of said groups of data processors ;
a data dictionary that stores information indicating where each said replica replica of each fragment of said database tables is stored among said N data processors ;
said fragmenting means further adapted for changing the information stored in said data dictionary upon failure of any one of said N data processors to indicate that the replicas stored on the failed data processor are not available , and for regenerating said replicas on the failed data processor in non-failed ones , if any , of the data processors in the same group of data processors as the failed data processor ;
and said fragmenting means further adapted for dividing said database tables into F fragments F s x , for storing said F fragments in the data processors in each said group , where for a particular fragment F s x , S identifies the group of data processors in which the fragment is stored and x is an index that identifies the fragment and has a value between 0 and F-1 ;
said fragmenting means adapted to assign each fragment F s x to a data processor y in group S in accordance with the following fragment to node assignment equation : y = (x + (x div N S)»Q(S)) modulo avail where y identifies which data processor fragment F s x is assigned to , N s is the number of data processors in group S used for storing database fragments , Q(S) is an integer between 0 and N s -1 where Q(S) is a distinct value for each said group , and avail is the number of said N s data processors that have not failed .

EP0829049A2
CLAIM 2
. The multiprocessor computer system of claim 1 , wherein each of said groups of data processors have different power (computer system) supplies and different cooling (mapping functions) systems .

US8190610B2
CLAIM 40
. A computer system (computer system, different power) including a plurality of computing devices , the computer system configured to process data of a data set , wherein the data set comprises a plurality of data groups , the computer system comprises at least one processor and memory that are operable to perform the following operations : for a first data set having a plurality of first key-value pairs , wherein such first data set belongs to a first data group and the first key-value pairs have a first schema , partitioning the first data set into a plurality of data partitions and providing each data partition to a selected one of a plurality of mapping functions (different cooling) that are each user-configurable to independently output a plurality of lists of values for each of a set of keys found in such map function' ;

s corresponding data partition to form a first intermediate data set having a first set (different one) of resulting key-value pairs ;

for a second data set having a plurality of second key-value pairs , wherein such second data set belongs to a second data group and the second key-value pairs have a second schema , partitioning the first data set into a plurality of data partitions and providing each data partition to a selected one of a plurality of mapping functions to form a second intermediate data set having a second set of resulting key-value pairs differing from the first set of resulting key-value pairs , wherein the first schema differs from the second schema ;

and reducing the first and the second intermediate data set together so as to form an output data set , wherein the reducing is accomplished by iterating on at least one key from the first and second intermediate data set , and the output data set has a different schema than the first and second schema , wherein the first and second schema and the first and second set of resulting key-value pairs have a key in common .
EP0829049A2
CLAIM 1
. A multiprocessor computer system (computer system) , comprising : N data processors , wherein N is a positive integer greater than three , each data processor having its own , separate , central processing unit , memory for storing database tables and other data structures , and communication channels for communication with other ones of said N data processors ;
each of said N data processors independently executing a distinct instruction data stream ;
at least a plurality of said N data processors including a communications processor for receiving transaction requests and for transmitting responses thereto ;
said N data processors being divided into at least two groups , each having at least two data processors ;
each data processor including : fragmenting means for fragmenting each of said database tables into N fragments , and for storing replicas of each fragment in different one (first set) s of said N data processors , wherein said different ones of said N data processors are in different ones of said groups of data processors such that a complete copy of each of said database tables is located within each said group of data processors and such that simultaneous failure of all data processors in either of said groups would leave a complete copy of each of said database tables in the other of said groups of data processors ;
a data dictionary that stores information indicating where each said replica replica of each fragment of said database tables is stored among said N data processors ;
said fragmenting means further adapted for changing the information stored in said data dictionary upon failure of any one of said N data processors to indicate that the replicas stored on the failed data processor are not available , and for regenerating said replicas on the failed data processor in non-failed ones , if any , of the data processors in the same group of data processors as the failed data processor ;
and said fragmenting means further adapted for dividing said database tables into F fragments F s x , for storing said F fragments in the data processors in each said group , where for a particular fragment F s x , S identifies the group of data processors in which the fragment is stored and x is an index that identifies the fragment and has a value between 0 and F-1 ;
said fragmenting means adapted to assign each fragment F s x to a data processor y in group S in accordance with the following fragment to node assignment equation : y = (x + (x div N S)»Q(S)) modulo avail where y identifies which data processor fragment F s x is assigned to , N s is the number of data processors in group S used for storing database fragments , Q(S) is an integer between 0 and N s -1 where Q(S) is a distinct value for each said group , and avail is the number of said N s data processors that have not failed .

EP0829049A2
CLAIM 2
. The multiprocessor computer system of claim 1 , wherein each of said groups of data processors have different power (computer system) supplies and different cooling (mapping functions) systems .

US8190610B2
CLAIM 41
. The computer system (computer system, different power) of claim 40 , wherein the reducing is accomplished by iterating on at least one key from each of the first and second intermediate data set so that the output data set is a merging of a portion of the first and second intermediate data set .
EP0829049A2
CLAIM 1
. A multiprocessor computer system (computer system) , comprising : N data processors , wherein N is a positive integer greater than three , each data processor having its own , separate , central processing unit , memory for storing database tables and other data structures , and communication channels for communication with other ones of said N data processors ;
each of said N data processors independently executing a distinct instruction data stream ;
at least a plurality of said N data processors including a communications processor for receiving transaction requests and for transmitting responses thereto ;
said N data processors being divided into at least two groups , each having at least two data processors ;
each data processor including : fragmenting means for fragmenting each of said database tables into N fragments , and for storing replicas of each fragment in different ones of said N data processors , wherein said different ones of said N data processors are in different ones of said groups of data processors such that a complete copy of each of said database tables is located within each said group of data processors and such that simultaneous failure of all data processors in either of said groups would leave a complete copy of each of said database tables in the other of said groups of data processors ;
a data dictionary that stores information indicating where each said replica replica of each fragment of said database tables is stored among said N data processors ;
said fragmenting means further adapted for changing the information stored in said data dictionary upon failure of any one of said N data processors to indicate that the replicas stored on the failed data processor are not available , and for regenerating said replicas on the failed data processor in non-failed ones , if any , of the data processors in the same group of data processors as the failed data processor ;
and said fragmenting means further adapted for dividing said database tables into F fragments F s x , for storing said F fragments in the data processors in each said group , where for a particular fragment F s x , S identifies the group of data processors in which the fragment is stored and x is an index that identifies the fragment and has a value between 0 and F-1 ;
said fragmenting means adapted to assign each fragment F s x to a data processor y in group S in accordance with the following fragment to node assignment equation : y = (x + (x div N S)»Q(S)) modulo avail where y identifies which data processor fragment F s x is assigned to , N s is the number of data processors in group S used for storing database fragments , Q(S) is an integer between 0 and N s -1 where Q(S) is a distinct value for each said group , and avail is the number of said N s data processors that have not failed .

EP0829049A2
CLAIM 2
. The multiprocessor computer system of claim 1 , wherein each of said groups of data processors have different power (computer system) supplies and different cooling systems .

US8190610B2
CLAIM 42
. The computer system (computer system, different power) of claim 40 , wherein the reducing is accomplished by a plurality of reducers , the d the at least one processor and memory further operable for : partitioning the first or second intermediate data set into a plurality of partitions , at least some of the key-value pairs of the first or second intermediate data being provided to more than one of the partitions ;

and providing the partitioned intermediate data set of each partition to a separate one of the reducers .
EP0829049A2
CLAIM 1
. A multiprocessor computer system (computer system) , comprising : N data processors , wherein N is a positive integer greater than three , each data processor having its own , separate , central processing unit , memory for storing database tables and other data structures , and communication channels for communication with other ones of said N data processors ;
each of said N data processors independently executing a distinct instruction data stream ;
at least a plurality of said N data processors including a communications processor for receiving transaction requests and for transmitting responses thereto ;
said N data processors being divided into at least two groups , each having at least two data processors ;
each data processor including : fragmenting means for fragmenting each of said database tables into N fragments , and for storing replicas of each fragment in different ones of said N data processors , wherein said different ones of said N data processors are in different ones of said groups of data processors such that a complete copy of each of said database tables is located within each said group of data processors and such that simultaneous failure of all data processors in either of said groups would leave a complete copy of each of said database tables in the other of said groups of data processors ;
a data dictionary that stores information indicating where each said replica replica of each fragment of said database tables is stored among said N data processors ;
said fragmenting means further adapted for changing the information stored in said data dictionary upon failure of any one of said N data processors to indicate that the replicas stored on the failed data processor are not available , and for regenerating said replicas on the failed data processor in non-failed ones , if any , of the data processors in the same group of data processors as the failed data processor ;
and said fragmenting means further adapted for dividing said database tables into F fragments F s x , for storing said F fragments in the data processors in each said group , where for a particular fragment F s x , S identifies the group of data processors in which the fragment is stored and x is an index that identifies the fragment and has a value between 0 and F-1 ;
said fragmenting means adapted to assign each fragment F s x to a data processor y in group S in accordance with the following fragment to node assignment equation : y = (x + (x div N S)»Q(S)) modulo avail where y identifies which data processor fragment F s x is assigned to , N s is the number of data processors in group S used for storing database fragments , Q(S) is an integer between 0 and N s -1 where Q(S) is a distinct value for each said group , and avail is the number of said N s data processors that have not failed .

EP0829049A2
CLAIM 2
. The multiprocessor computer system of claim 1 , wherein each of said groups of data processors have different power (computer system) supplies and different cooling systems .

US8190610B2
CLAIM 43
. The computer system (computer system, different power) of claim 42 , wherein all of the key-value pairs of the first and second intermediate data set are provided to all of the reducers .
EP0829049A2
CLAIM 1
. A multiprocessor computer system (computer system) , comprising : N data processors , wherein N is a positive integer greater than three , each data processor having its own , separate , central processing unit , memory for storing database tables and other data structures , and communication channels for communication with other ones of said N data processors ;
each of said N data processors independently executing a distinct instruction data stream ;
at least a plurality of said N data processors including a communications processor for receiving transaction requests and for transmitting responses thereto ;
said N data processors being divided into at least two groups , each having at least two data processors ;
each data processor including : fragmenting means for fragmenting each of said database tables into N fragments , and for storing replicas of each fragment in different ones of said N data processors , wherein said different ones of said N data processors are in different ones of said groups of data processors such that a complete copy of each of said database tables is located within each said group of data processors and such that simultaneous failure of all data processors in either of said groups would leave a complete copy of each of said database tables in the other of said groups of data processors ;
a data dictionary that stores information indicating where each said replica replica of each fragment of said database tables is stored among said N data processors ;
said fragmenting means further adapted for changing the information stored in said data dictionary upon failure of any one of said N data processors to indicate that the replicas stored on the failed data processor are not available , and for regenerating said replicas on the failed data processor in non-failed ones , if any , of the data processors in the same group of data processors as the failed data processor ;
and said fragmenting means further adapted for dividing said database tables into F fragments F s x , for storing said F fragments in the data processors in each said group , where for a particular fragment F s x , S identifies the group of data processors in which the fragment is stored and x is an index that identifies the fragment and has a value between 0 and F-1 ;
said fragmenting means adapted to assign each fragment F s x to a data processor y in group S in accordance with the following fragment to node assignment equation : y = (x + (x div N S)»Q(S)) modulo avail where y identifies which data processor fragment F s x is assigned to , N s is the number of data processors in group S used for storing database fragments , Q(S) is an integer between 0 and N s -1 where Q(S) is a distinct value for each said group , and avail is the number of said N s data processors that have not failed .

EP0829049A2
CLAIM 2
. The multiprocessor computer system of claim 1 , wherein each of said groups of data processors have different power (computer system) supplies and different cooling systems .

US8190610B2
CLAIM 44
. The computer system (computer system, different power) of claim 42 , wherein at least some of the reducers include a sort , group-by-key and combine task , the d the at least one processor and memory are further operable for : generating and providing metadata for at least some of the mapping , partitioning , combining , grouping and sorting .
EP0829049A2
CLAIM 1
. A multiprocessor computer system (computer system) , comprising : N data processors , wherein N is a positive integer greater than three , each data processor having its own , separate , central processing unit , memory for storing database tables and other data structures , and communication channels for communication with other ones of said N data processors ;
each of said N data processors independently executing a distinct instruction data stream ;
at least a plurality of said N data processors including a communications processor for receiving transaction requests and for transmitting responses thereto ;
said N data processors being divided into at least two groups , each having at least two data processors ;
each data processor including : fragmenting means for fragmenting each of said database tables into N fragments , and for storing replicas of each fragment in different ones of said N data processors , wherein said different ones of said N data processors are in different ones of said groups of data processors such that a complete copy of each of said database tables is located within each said group of data processors and such that simultaneous failure of all data processors in either of said groups would leave a complete copy of each of said database tables in the other of said groups of data processors ;
a data dictionary that stores information indicating where each said replica replica of each fragment of said database tables is stored among said N data processors ;
said fragmenting means further adapted for changing the information stored in said data dictionary upon failure of any one of said N data processors to indicate that the replicas stored on the failed data processor are not available , and for regenerating said replicas on the failed data processor in non-failed ones , if any , of the data processors in the same group of data processors as the failed data processor ;
and said fragmenting means further adapted for dividing said database tables into F fragments F s x , for storing said F fragments in the data processors in each said group , where for a particular fragment F s x , S identifies the group of data processors in which the fragment is stored and x is an index that identifies the fragment and has a value between 0 and F-1 ;
said fragmenting means adapted to assign each fragment F s x to a data processor y in group S in accordance with the following fragment to node assignment equation : y = (x + (x div N S)»Q(S)) modulo avail where y identifies which data processor fragment F s x is assigned to , N s is the number of data processors in group S used for storing database fragments , Q(S) is an integer between 0 and N s -1 where Q(S) is a distinct value for each said group , and avail is the number of said N s data processors that have not failed .

EP0829049A2
CLAIM 2
. The multiprocessor computer system of claim 1 , wherein each of said groups of data processors have different power (computer system) supplies and different cooling systems .

US8190610B2
CLAIM 45
. The computer system (computer system, different power) of claim 44 , wherein the reducing includes processing the metadata .
EP0829049A2
CLAIM 1
. A multiprocessor computer system (computer system) , comprising : N data processors , wherein N is a positive integer greater than three , each data processor having its own , separate , central processing unit , memory for storing database tables and other data structures , and communication channels for communication with other ones of said N data processors ;
each of said N data processors independently executing a distinct instruction data stream ;
at least a plurality of said N data processors including a communications processor for receiving transaction requests and for transmitting responses thereto ;
said N data processors being divided into at least two groups , each having at least two data processors ;
each data processor including : fragmenting means for fragmenting each of said database tables into N fragments , and for storing replicas of each fragment in different ones of said N data processors , wherein said different ones of said N data processors are in different ones of said groups of data processors such that a complete copy of each of said database tables is located within each said group of data processors and such that simultaneous failure of all data processors in either of said groups would leave a complete copy of each of said database tables in the other of said groups of data processors ;
a data dictionary that stores information indicating where each said replica replica of each fragment of said database tables is stored among said N data processors ;
said fragmenting means further adapted for changing the information stored in said data dictionary upon failure of any one of said N data processors to indicate that the replicas stored on the failed data processor are not available , and for regenerating said replicas on the failed data processor in non-failed ones , if any , of the data processors in the same group of data processors as the failed data processor ;
and said fragmenting means further adapted for dividing said database tables into F fragments F s x , for storing said F fragments in the data processors in each said group , where for a particular fragment F s x , S identifies the group of data processors in which the fragment is stored and x is an index that identifies the fragment and has a value between 0 and F-1 ;
said fragmenting means adapted to assign each fragment F s x to a data processor y in group S in accordance with the following fragment to node assignment equation : y = (x + (x div N S)»Q(S)) modulo avail where y identifies which data processor fragment F s x is assigned to , N s is the number of data processors in group S used for storing database fragments , Q(S) is an integer between 0 and N s -1 where Q(S) is a distinct value for each said group , and avail is the number of said N s data processors that have not failed .

EP0829049A2
CLAIM 2
. The multiprocessor computer system of claim 1 , wherein each of said groups of data processors have different power (computer system) supplies and different cooling systems .

US8190610B2
CLAIM 46
. The computer system (computer system, different power) of claim 45 , wherein iterating includes providing the associated metadata to the processing of the reducing step .
EP0829049A2
CLAIM 1
. A multiprocessor computer system (computer system) , comprising : N data processors , wherein N is a positive integer greater than three , each data processor having its own , separate , central processing unit , memory for storing database tables and other data structures , and communication channels for communication with other ones of said N data processors ;
each of said N data processors independently executing a distinct instruction data stream ;
at least a plurality of said N data processors including a communications processor for receiving transaction requests and for transmitting responses thereto ;
said N data processors being divided into at least two groups , each having at least two data processors ;
each data processor including : fragmenting means for fragmenting each of said database tables into N fragments , and for storing replicas of each fragment in different ones of said N data processors , wherein said different ones of said N data processors are in different ones of said groups of data processors such that a complete copy of each of said database tables is located within each said group of data processors and such that simultaneous failure of all data processors in either of said groups would leave a complete copy of each of said database tables in the other of said groups of data processors ;
a data dictionary that stores information indicating where each said replica replica of each fragment of said database tables is stored among said N data processors ;
said fragmenting means further adapted for changing the information stored in said data dictionary upon failure of any one of said N data processors to indicate that the replicas stored on the failed data processor are not available , and for regenerating said replicas on the failed data processor in non-failed ones , if any , of the data processors in the same group of data processors as the failed data processor ;
and said fragmenting means further adapted for dividing said database tables into F fragments F s x , for storing said F fragments in the data processors in each said group , where for a particular fragment F s x , S identifies the group of data processors in which the fragment is stored and x is an index that identifies the fragment and has a value between 0 and F-1 ;
said fragmenting means adapted to assign each fragment F s x to a data processor y in group S in accordance with the following fragment to node assignment equation : y = (x + (x div N S)»Q(S)) modulo avail where y identifies which data processor fragment F s x is assigned to , N s is the number of data processors in group S used for storing database fragments , Q(S) is an integer between 0 and N s -1 where Q(S) is a distinct value for each said group , and avail is the number of said N s data processors that have not failed .

EP0829049A2
CLAIM 2
. The multiprocessor computer system of claim 1 , wherein each of said groups of data processors have different power (computer system) supplies and different cooling systems .




US8190610B2

Filed: 2006-10-05     Issued: 2012-05-29

MapReduce for distributed database processing

(Original Assignee) Yahoo Inc     (Current Assignee) R2 Solutions LLC ; Altaba Inc

Ali Dasdan, Hung-Chih Yang, Ruey-Lung Hsiao
JPH07319923A

Filed: 1995-04-03     Issued: 1995-12-08

マルチプロセッサコンピュータシステムの並行データベースを処理するための方法および装置

(Original Assignee) At & T Global Inf Solutions Internatl Inc; エイ・ティ・アンド・ティ グローバル インフォメーション ソルーションズ インターナショナル インコーポレイテッド     

G Stellwagen Richard Jr, ジー.ステルワゲン ジュニア. リチャード
US8190610B2
CLAIM 1
. A method of processing data (コンピュ) of a data set over a distributed system , wherein the data set comprises a plurality of data groups , the method comprising : partitioning the data of each one of the data groups into a plurality of data partitions that each have a plurality of key-value pairs and providing each data partition to a selected one of a plurality of mapping functions that are each user-configurable to independently output a plurality of lists of values for each of a set of keys found in such map function' ;

s corresponding data partition to form corresponding intermediate data for that data group and identifiable to that data group , wherein the data of a first data group has a different schema than the data of a second data group and the data of the first data group is mapped differently than the data of the second data group so that different lists of values are output for the corresponding different intermediate data , wherein the different schema and corresponding different intermediate data have a key in common ;

and reducing the intermediate data for the data groups to at least one output data group , including processing the intermediate data for each data group in a manner that is defined to correspond to that data group , so as to result in a merging of the corresponding different intermediate data based on the key in common , wherein the mapping and reducing operations are performed by a distributed system .
JPH07319923A
CLAIM 3
【請求項3】 (a)1個または1個を超える量のプロ セッサと、 (b)1個または1個を超える量のデータベースを記憶 する1個または1個を超える量のディスクドライブとか ら構成し、 上記データベースの各々はレンジ、ハッシュ、おゆびス キーマパティションを備えるグループから選択されたパ ーティショニングタイプにより区分され、さらに (c)オペレータからのデータベースオペレーションに たいする要求を受信するため、および、複数の並行標準 照会言語ステートメントに上記要求を変換するためのナ ビゲーションサーバと、 (d)上記ナビゲーションサーバプロセスからの上記並 行標準照会言語ステートメントを受信し、各上記並行標 準照会言語ステートメントのそれぞえを実行して予備の 結果を発生し、上記予備の結果を組み合わせて最終結果 を生成し、上記最終を上記オペレータにもどるための、 ナビゲーションサーバを通信する、複数のデータベース サーバとから構成し、 (e)上記データベースサーバの各々は1個または1個 を超える数の上記データベースパーティションを管理 し、各標準照会言語ステートメントは1個または1個を 超えるデータベースパーティションを参照し、したがっ て、上記標準照会言語ステートメントは同時に実行さ れ、各データベースパーティションは同時にアクセスさ れ、これにより、上記要求の全てのアクセスタイムを低 減する、 コンピュ (processing data) ータ化したデータベース記憶装置および検索シ ステム。

US8190610B2
CLAIM 13
. The method of claim 1 , wherein : the intermediate data processing step of the reducing step further comprises processing data (コンピュ) that is not intermediate data .
JPH07319923A
CLAIM 3
【請求項3】 (a)1個または1個を超える量のプロ セッサと、 (b)1個または1個を超える量のデータベースを記憶 する1個または1個を超える量のディスクドライブとか ら構成し、 上記データベースの各々はレンジ、ハッシュ、おゆびス キーマパティションを備えるグループから選択されたパ ーティショニングタイプにより区分され、さらに (c)オペレータからのデータベースオペレーションに たいする要求を受信するため、および、複数の並行標準 照会言語ステートメントに上記要求を変換するためのナ ビゲーションサーバと、 (d)上記ナビゲーションサーバプロセスからの上記並 行標準照会言語ステートメントを受信し、各上記並行標 準照会言語ステートメントのそれぞえを実行して予備の 結果を発生し、上記予備の結果を組み合わせて最終結果 を生成し、上記最終を上記オペレータにもどるための、 ナビゲーションサーバを通信する、複数のデータベース サーバとから構成し、 (e)上記データベースサーバの各々は1個または1個 を超える数の上記データベースパーティションを管理 し、各標準照会言語ステートメントは1個または1個を 超えるデータベースパーティションを参照し、したがっ て、上記標準照会言語ステートメントは同時に実行さ れ、各データベースパーティションは同時にアクセスさ れ、これにより、上記要求の全てのアクセスタイムを低 減する、 コンピュ (processing data) ータ化したデータベース記憶装置および検索シ ステム。

US8190610B2
CLAIM 29
. The computer system of claim 17 , wherein : the intermediate data processing of the reducing further comprises processing data (コンピュ) that is not intermediate data .
JPH07319923A
CLAIM 3
【請求項3】 (a)1個または1個を超える量のプロ セッサと、 (b)1個または1個を超える量のデータベースを記憶 する1個または1個を超える量のディスクドライブとか ら構成し、 上記データベースの各々はレンジ、ハッシュ、おゆびス キーマパティションを備えるグループから選択されたパ ーティショニングタイプにより区分され、さらに (c)オペレータからのデータベースオペレーションに たいする要求を受信するため、および、複数の並行標準 照会言語ステートメントに上記要求を変換するためのナ ビゲーションサーバと、 (d)上記ナビゲーションサーバプロセスからの上記並 行標準照会言語ステートメントを受信し、各上記並行標 準照会言語ステートメントのそれぞえを実行して予備の 結果を発生し、上記予備の結果を組み合わせて最終結果 を生成し、上記最終を上記オペレータにもどるための、 ナビゲーションサーバを通信する、複数のデータベース サーバとから構成し、 (e)上記データベースサーバの各々は1個または1個 を超える数の上記データベースパーティションを管理 し、各標準照会言語ステートメントは1個または1個を 超えるデータベースパーティションを参照し、したがっ て、上記標準照会言語ステートメントは同時に実行さ れ、各データベースパーティションは同時にアクセスさ れ、これにより、上記要求の全てのアクセスタイムを低 減する、 コンピュ (processing data) ータ化したデータベース記憶装置および検索シ ステム。

US8190610B2
CLAIM 33
. A map-reduce method of processing data (コンピュ) from a plurality of groups having different schema over a computer system , the method comprising : for a first data set having a plurality of first key-value pairs , wherein such first data set belongs to a first data group and the first key-value pairs have a first schema , partitioning the first data set into a plurality of data partitions and providing each data partition to a selected one of a plurality of mapping functions that are each user-configurable to independently output a plurality of lists of values for each of a set of keys found in such map function' ;

s corresponding data partition to form a first intermediate data set having a first set of resulting key-value pairs ;

for a second data set having a plurality of second key-value pairs , wherein such second data set belongs to a second data group and the second key-value pairs have a second schema , partitioning the first data set into a plurality of data partitions and providing each data partition to a selected one of a plurality of mapping functions to form a second intermediate data set having a second set of resulting key-value pairs differing from the first set of resulting key-value pairs , wherein the first schema differs from the second schema , wherein the first and second schema and the first and second set of resulting key-value pairs have a key in common ;

and reducing the first and the second intermediate data set together so as to form an output data set , wherein the reducing is accomplished by iterating on at least one key from the first and second intermediate data set , and the output data set has a different schema than the first and second schema , wherein the mapping and reducing operations are performed by a distributed system .
JPH07319923A
CLAIM 3
【請求項3】 (a)1個または1個を超える量のプロ セッサと、 (b)1個または1個を超える量のデータベースを記憶 する1個または1個を超える量のディスクドライブとか ら構成し、 上記データベースの各々はレンジ、ハッシュ、おゆびス キーマパティションを備えるグループから選択されたパ ーティショニングタイプにより区分され、さらに (c)オペレータからのデータベースオペレーションに たいする要求を受信するため、および、複数の並行標準 照会言語ステートメントに上記要求を変換するためのナ ビゲーションサーバと、 (d)上記ナビゲーションサーバプロセスからの上記並 行標準照会言語ステートメントを受信し、各上記並行標 準照会言語ステートメントのそれぞえを実行して予備の 結果を発生し、上記予備の結果を組み合わせて最終結果 を生成し、上記最終を上記オペレータにもどるための、 ナビゲーションサーバを通信する、複数のデータベース サーバとから構成し、 (e)上記データベースサーバの各々は1個または1個 を超える数の上記データベースパーティションを管理 し、各標準照会言語ステートメントは1個または1個を 超えるデータベースパーティションを参照し、したがっ て、上記標準照会言語ステートメントは同時に実行さ れ、各データベースパーティションは同時にアクセスさ れ、これにより、上記要求の全てのアクセスタイムを低 減する、 コンピュ (processing data) ータ化したデータベース記憶装置および検索シ ステム。




US8190610B2

Filed: 2006-10-05     Issued: 2012-05-29

MapReduce for distributed database processing

(Original Assignee) Yahoo Inc     (Current Assignee) R2 Solutions LLC ; Altaba Inc

Ali Dasdan, Hung-Chih Yang, Ruey-Lung Hsiao
EP0692121A1

Filed: 1994-03-25     Issued: 1996-01-17

File difference engine

(Original Assignee) Squibb Data Systems Inc     (Current Assignee) Squibb Data Systems Inc

Mark Squibb
US8190610B2
CLAIM 1
. A method of processing data of a data set (hash table) over a distributed system , wherein the data set comprises a plurality of data groups , the method comprising : partitioning the data of each one of the data groups into a plurality of data partitions (one file) that each have a plurality of key-value pairs and providing each data partition to a selected one of a plurality of mapping functions that are each user-configurable to independently output a plurality of lists of values for each of a set of keys found in such map function' ;

s corresponding data partition to form corresponding intermediate data for that data group and identifiable to that data group , wherein the data of a first data (first data) group has a different schema than the data of a second data (second data, said memory) group and the data of the first data group is mapped differently than the data of the second data group so that different lists (said list) of values are output for the corresponding different intermediate data , wherein the different schema and corresponding different intermediate data have a key in common ;

and reducing the intermediate data for the data groups to at least one output data group , including processing the intermediate data for each data group in a manner that is defined to correspond to that data group , so as to result in a merging of the corresponding different intermediate data based on the key in common , wherein the mapping and reducing operations are performed by a distributed system .
EP0692121A1
CLAIM 7
. A computer apparatus comprising a storage device , a file stored in said storage device , means for generating a token table signature of said file and storing it in said memory (second data) , said means for generating and storing a signature comprising means for generating first and second different hashing mathematical representations of fixed equal length character segments of said file .

EP0692121A1
CLAIM 19
. A combination comprising a memory , a signature and a first data (first data) file stored in said memory , said signature comprising a difference between said first data file and a second data (second data) file with respect to one another , said first and second data files each having successive segments of characters , said signature further comprising indexes of successive segments in said first and second data files and offsets indicating a displacement from a reference point of identical character segments in the first and second files .

EP0692121A1
CLAIM 22
. A combination according to claim 19 , wherein the differences of one file (data partitions) of said first and second data files with respect to the other reflect added information and the segments are of a fixed size , representations of added information associated with corresponding offsets differing by more than the segment size from the previous offset .

EP0692121A1
CLAIM 53
. The method of claim 50 further comprising , generating a list of difference signature between the original and updated files from a residue of said comparison , said residue corresponding to segments of the files that do not compare , and producing said copy of said second file from said list (different lists) and a copy of only one of said original and second files .

EP0692121A1
CLAIM 55
. A method for producing a first file representative of differences between second and third files , comprising generating first and second hash table (data set, value pairs, output data groups, output data set) s from of successive equal length segments of each of said second and third files , respectively , wherein a segment is a set of successive characters , comparing said first and second hash tables and successively offsetting the positions of the hash tables , with respect to said second and third files , to identify segments of said second and third files that match one another , and producing said first file by listing segments of said second and third files that match one another .

US8190610B2
CLAIM 2
. The method of claim 1 , wherein : the at least one output data group is a plurality of output data groups (hash table) .
EP0692121A1
CLAIM 55
. A method for producing a first file representative of differences between second and third files , comprising generating first and second hash table (data set, value pairs, output data groups, output data set) s from of successive equal length segments of each of said second and third files , respectively , wherein a segment is a set of successive characters , comparing said first and second hash tables and successively offsetting the positions of the hash tables , with respect to said second and third files , to identify segments of said second and third files that match one another , and producing said first file by listing segments of said second and third files that match one another .

US8190610B2
CLAIM 17
. A computer system including a plurality of computing devices , the computer system configured to process data of a data set (hash table) , wherein the data set comprises a plurality of data groups , the computer system comprises at least one processor and memory that are operable to perform the following operations : partitioning the data of each one of the data groups into a plurality of data partitions (one file) that each have a plurality of key-value pairs and providing each data partition to a selected one of a plurality of mapping functions that are each user-configurable to independently output a plurality of lists of values for each of a set of keys found in such map function' ;

s corresponding data partition to form corresponding intermediate data for that data group and identifiable to that data group , wherein the data of a first data (first data) group has a different schema than the data of a second data (second data, said memory) group and the data of the first data group is mapped differently than the data of the second data group so that different lists (said list) of values are output for the corresponding different intermediate data , wherein the different schema and corresponding different intermediate data have a key in common ;

and reduce the intermediate data for the data groups to at least one output data group , including processing the intermediate data for each data group in a manner that is defined to correspond to that data group so as to result in a merging of the corresponding different intermediate data based on the key in common .
EP0692121A1
CLAIM 7
. A computer apparatus comprising a storage device , a file stored in said storage device , means for generating a token table signature of said file and storing it in said memory (second data) , said means for generating and storing a signature comprising means for generating first and second different hashing mathematical representations of fixed equal length character segments of said file .

EP0692121A1
CLAIM 19
. A combination comprising a memory , a signature and a first data (first data) file stored in said memory , said signature comprising a difference between said first data file and a second data (second data) file with respect to one another , said first and second data files each having successive segments of characters , said signature further comprising indexes of successive segments in said first and second data files and offsets indicating a displacement from a reference point of identical character segments in the first and second files .

EP0692121A1
CLAIM 22
. A combination according to claim 19 , wherein the differences of one file (data partitions) of said first and second data files with respect to the other reflect added information and the segments are of a fixed size , representations of added information associated with corresponding offsets differing by more than the segment size from the previous offset .

EP0692121A1
CLAIM 53
. The method of claim 50 further comprising , generating a list of difference signature between the original and updated files from a residue of said comparison , said residue corresponding to segments of the files that do not compare , and producing said copy of said second file from said list (different lists) and a copy of only one of said original and second files .

EP0692121A1
CLAIM 55
. A method for producing a first file representative of differences between second and third files , comprising generating first and second hash table (data set, value pairs, output data groups, output data set) s from of successive equal length segments of each of said second and third files , respectively , wherein a segment is a set of successive characters , comparing said first and second hash tables and successively offsetting the positions of the hash tables , with respect to said second and third files , to identify segments of said second and third files that match one another , and producing said first file by listing segments of said second and third files that match one another .

US8190610B2
CLAIM 18
. The computer system of claim 17 , wherein : the at least one output data group is a plurality of output data groups (hash table) .
EP0692121A1
CLAIM 55
. A method for producing a first file representative of differences between second and third files , comprising generating first and second hash table (data set, value pairs, output data groups, output data set) s from of successive equal length segments of each of said second and third files , respectively , wherein a segment is a set of successive characters , comparing said first and second hash tables and successively offsetting the positions of the hash tables , with respect to said second and third files , to identify segments of said second and third files that match one another , and producing said first file by listing segments of said second and third files that match one another .

US8190610B2
CLAIM 33
. A map-reduce method of processing data from a plurality of groups having different schema over a computer system , the method comprising : for a first data set (hash table) having a plurality of first key-value pairs , wherein such first data set belongs to a first data (first data) group and the first key-value pairs have a first schema , partitioning the first data set into a plurality of data partitions (one file) and providing each data partition to a selected one of a plurality of mapping functions that are each user-configurable to independently output a plurality of lists of values for each of a set of keys found in such map function' ;

s corresponding data partition to form a first intermediate data set having a first set of resulting key-value pairs ;

for a second data (second data, said memory) set (respective segments) having a plurality of second key-value pairs , wherein such second data set belongs to a second data group and the second key-value pairs have a second schema , partitioning the first data set into a plurality of data partitions and providing each data partition to a selected one of a plurality of mapping functions to form a second intermediate data set having a second set (respective segments) of resulting key-value pairs differing from the first set of resulting key-value pairs , wherein the first schema differs from the second schema , wherein the first and second schema and the first and second set of resulting key-value pairs have a key in common ;

and reducing the first and the second intermediate data set together so as to form an output data set , wherein the reducing is accomplished by iterating on at least one key from the first and second intermediate data set , and the output data set has a different schema than the first and second schema , wherein the mapping and reducing operations are performed by a distributed system .
EP0692121A1
CLAIM 5
. A combination according to claim 1 , wherein the second mathematical representation comprises a cyclic redundancy product of the characters of the respective segments (second set, second data set) of the file .

EP0692121A1
CLAIM 7
. A computer apparatus comprising a storage device , a file stored in said storage device , means for generating a token table signature of said file and storing it in said memory (second data) , said means for generating and storing a signature comprising means for generating first and second different hashing mathematical representations of fixed equal length character segments of said file .

EP0692121A1
CLAIM 19
. A combination comprising a memory , a signature and a first data (first data) file stored in said memory , said signature comprising a difference between said first data file and a second data (second data) file with respect to one another , said first and second data files each having successive segments of characters , said signature further comprising indexes of successive segments in said first and second data files and offsets indicating a displacement from a reference point of identical character segments in the first and second files .

EP0692121A1
CLAIM 22
. A combination according to claim 19 , wherein the differences of one file (data partitions) of said first and second data files with respect to the other reflect added information and the segments are of a fixed size , representations of added information associated with corresponding offsets differing by more than the segment size from the previous offset .

EP0692121A1
CLAIM 55
. A method for producing a first file representative of differences between second and third files , comprising generating first and second hash table (data set, value pairs, output data groups, output data set) s from of successive equal length segments of each of said second and third files , respectively , wherein a segment is a set of successive characters , comparing said first and second hash tables and successively offsetting the positions of the hash tables , with respect to said second and third files , to identify segments of said second and third files that match one another , and producing said first file by listing segments of said second and third files that match one another .

US8190610B2
CLAIM 34
. The map-reduce method of claim 33 , wherein the reducing is accomplished by iterating on at least one key from each of the first and second intermediate data set (hash table) so that the output data set is a merging of a portion of the first and second intermediate data set .
EP0692121A1
CLAIM 55
. A method for producing a first file representative of differences between second and third files , comprising generating first and second hash table (data set, value pairs, output data groups, output data set) s from of successive equal length segments of each of said second and third files , respectively , wherein a segment is a set of successive characters , comparing said first and second hash tables and successively offsetting the positions of the hash tables , with respect to said second and third files , to identify segments of said second and third files that match one another , and producing said first file by listing segments of said second and third files that match one another .

US8190610B2
CLAIM 35
. The map-reduce method of claim 33 , wherein the reducing is accomplished by a plurality of reducers , the method further comprises : partitioning the first or second intermediate data set (hash table) into a plurality of partitions , at least some of the key-value pairs of the first or second intermediate data being provided to more than one of the partitions ;

and providing the partitioned intermediate data set of each partition to a separate one of the reducers .
EP0692121A1
CLAIM 55
. A method for producing a first file representative of differences between second and third files , comprising generating first and second hash table (data set, value pairs, output data groups, output data set) s from of successive equal length segments of each of said second and third files , respectively , wherein a segment is a set of successive characters , comparing said first and second hash tables and successively offsetting the positions of the hash tables , with respect to said second and third files , to identify segments of said second and third files that match one another , and producing said first file by listing segments of said second and third files that match one another .

US8190610B2
CLAIM 36
. The map-reduce method of claim 35 , wherein all of the key-value pairs of the first and second intermediate data set (hash table) are provided to all of the reducers .
EP0692121A1
CLAIM 55
. A method for producing a first file representative of differences between second and third files , comprising generating first and second hash table (data set, value pairs, output data groups, output data set) s from of successive equal length segments of each of said second and third files , respectively , wherein a segment is a set of successive characters , comparing said first and second hash tables and successively offsetting the positions of the hash tables , with respect to said second and third files , to identify segments of said second and third files that match one another , and producing said first file by listing segments of said second and third files that match one another .

US8190610B2
CLAIM 40
. A computer system including a plurality of computing devices , the computer system configured to process data of a data set (hash table) , wherein the data set comprises a plurality of data groups , the computer system comprises at least one processor and memory that are operable to perform the following operations : for a first data set having a plurality of first key-value pairs , wherein such first data set belongs to a first data (first data) group and the first key-value pairs have a first schema , partitioning the first data set into a plurality of data partitions (one file) and providing each data partition to a selected one of a plurality of mapping functions that are each user-configurable to independently output a plurality of lists of values for each of a set of keys found in such map function' ;

s corresponding data partition to form a first intermediate data set having a first set of resulting key-value pairs ;

for a second data (second data, said memory) set (respective segments) having a plurality of second key-value pairs , wherein such second data set belongs to a second data group and the second key-value pairs have a second schema , partitioning the first data set into a plurality of data partitions and providing each data partition to a selected one of a plurality of mapping functions to form a second intermediate data set having a second set (respective segments) of resulting key-value pairs differing from the first set of resulting key-value pairs , wherein the first schema differs from the second schema ;

and reducing the first and the second intermediate data set together so as to form an output data set , wherein the reducing is accomplished by iterating on at least one key from the first and second intermediate data set , and the output data set has a different schema than the first and second schema , wherein the first and second schema and the first and second set of resulting key-value pairs have a key in common .
EP0692121A1
CLAIM 5
. A combination according to claim 1 , wherein the second mathematical representation comprises a cyclic redundancy product of the characters of the respective segments (second set, second data set) of the file .

EP0692121A1
CLAIM 7
. A computer apparatus comprising a storage device , a file stored in said storage device , means for generating a token table signature of said file and storing it in said memory (second data) , said means for generating and storing a signature comprising means for generating first and second different hashing mathematical representations of fixed equal length character segments of said file .

EP0692121A1
CLAIM 19
. A combination comprising a memory , a signature and a first data (first data) file stored in said memory , said signature comprising a difference between said first data file and a second data (second data) file with respect to one another , said first and second data files each having successive segments of characters , said signature further comprising indexes of successive segments in said first and second data files and offsets indicating a displacement from a reference point of identical character segments in the first and second files .

EP0692121A1
CLAIM 22
. A combination according to claim 19 , wherein the differences of one file (data partitions) of said first and second data files with respect to the other reflect added information and the segments are of a fixed size , representations of added information associated with corresponding offsets differing by more than the segment size from the previous offset .

EP0692121A1
CLAIM 55
. A method for producing a first file representative of differences between second and third files , comprising generating first and second hash table (data set, value pairs, output data groups, output data set) s from of successive equal length segments of each of said second and third files , respectively , wherein a segment is a set of successive characters , comparing said first and second hash tables and successively offsetting the positions of the hash tables , with respect to said second and third files , to identify segments of said second and third files that match one another , and producing said first file by listing segments of said second and third files that match one another .

US8190610B2
CLAIM 41
. The computer system of claim 40 , wherein the reducing is accomplished by iterating on at least one key from each of the first and second intermediate data set (hash table) so that the output data set is a merging of a portion of the first and second intermediate data set .
EP0692121A1
CLAIM 55
. A method for producing a first file representative of differences between second and third files , comprising generating first and second hash table (data set, value pairs, output data groups, output data set) s from of successive equal length segments of each of said second and third files , respectively , wherein a segment is a set of successive characters , comparing said first and second hash tables and successively offsetting the positions of the hash tables , with respect to said second and third files , to identify segments of said second and third files that match one another , and producing said first file by listing segments of said second and third files that match one another .

US8190610B2
CLAIM 42
. The computer system of claim 40 , wherein the reducing is accomplished by a plurality of reducers , the d the at least one processor and memory further operable for : partitioning the first or second intermediate data set (hash table) into a plurality of partitions , at least some of the key-value pairs of the first or second intermediate data being provided to more than one of the partitions ;

and providing the partitioned intermediate data set of each partition to a separate one of the reducers .
EP0692121A1
CLAIM 55
. A method for producing a first file representative of differences between second and third files , comprising generating first and second hash table (data set, value pairs, output data groups, output data set) s from of successive equal length segments of each of said second and third files , respectively , wherein a segment is a set of successive characters , comparing said first and second hash tables and successively offsetting the positions of the hash tables , with respect to said second and third files , to identify segments of said second and third files that match one another , and producing said first file by listing segments of said second and third files that match one another .

US8190610B2
CLAIM 43
. The computer system of claim 42 , wherein all of the key-value pairs of the first and second intermediate data set (hash table) are provided to all of the reducers .
EP0692121A1
CLAIM 55
. A method for producing a first file representative of differences between second and third files , comprising generating first and second hash table (data set, value pairs, output data groups, output data set) s from of successive equal length segments of each of said second and third files , respectively , wherein a segment is a set of successive characters , comparing said first and second hash tables and successively offsetting the positions of the hash tables , with respect to said second and third files , to identify segments of said second and third files that match one another , and producing said first file by listing segments of said second and third files that match one another .




US8190610B2

Filed: 2006-10-05     Issued: 2012-05-29

MapReduce for distributed database processing

(Original Assignee) Yahoo Inc     (Current Assignee) R2 Solutions LLC ; Altaba Inc

Ali Dasdan, Hung-Chih Yang, Ruey-Lung Hsiao
JPH07141394A

Filed: 1993-11-16     Issued: 1995-06-02

データベース分割管理方法および並列データベースシステム

(Original Assignee) Hitachi Ltd; 株式会社日立製作所     

Kazuo Masai, Shunichi Torii, Masashi Tsuchida, 正士 土田, 一夫 正井, 俊一 鳥居
US8190610B2
CLAIM 12
. The method of claim 5 , wherein : the intermediate data includes a plurality of grouped sets of key/value pairs ;

the reducing step is carried out by a plurality of reducers ;

for at least one of the reducers , the iterator corresponding to a particular data group , for that reducer , operates according to a different key (のキー) of a different schema than the iterator corresponding to another particular data group , for that reducer .
JPH07141394A
CLAIM 10
【請求項10】 請求項8に記載のデータベース分割管 理方法において、BESノードに割当てるプロセッサ数 またはIOSノードに割当てるプロセッサ数またはディ スク数を追加する場合、オンライン中であれば、追加対 象となるプロセッサまたはディスクで管理されるデータ ベースの表のキー (different key) レンジ範囲を閉塞し、新たにプロセッ サあるいはディスクを割り当て、ロック情報,ディレク トリ情報の引き継ぎを行い、ノード振り分け制御に必要 なディクショナリ情報の書き換えを行い、その後、オン ライン中であれば、前記閉塞を解除することを特徴とす るデータベース分割管理方法。

US8190610B2
CLAIM 17
. A computer system (行うこと) including a plurality of computing devices , the computer system configured to process data of a data set , wherein the data set comprises a plurality of data groups , the computer system comprises at least one processor and memory that are operable to perform the following operations : partitioning the data of each one of the data groups into a plurality of data partitions that each have a plurality of key-value pairs and providing each data partition to a selected one of a plurality of mapping functions that are each user-configurable to independently output a plurality of lists of values for each of a set of keys found in such map function' ;

s corresponding data partition to form corresponding intermediate data for that data group and identifiable to that data group , wherein the data of a first data group has a different schema than the data of a second data group and the data of the first data group is mapped differently than the data of the second data group so that different lists of values are output for the corresponding different intermediate data , wherein the different schema and corresponding different intermediate data have a key in common ;

and reduce the intermediate data for the data groups to at least one output data group , including processing the intermediate data for each data group in a manner that is defined to correspond to that data group so as to result in a merging of the corresponding different intermediate data based on the key in common .
JPH07141394A
CLAIM 7
【請求項7】 請求項1から請求項6のいずれかに記載 のデータベース分割管理方法において、最適ページアク セス数mを算出し、キーレンジ分割がある場合には、サ ブキーレンジ単位の格納ページ数s(=m/p)を算出 し、sページ単位でサブキーレンジ分割し、ディスクへ データ挿入を行うこと (computer system) を特徴とするデータベース分割管 理方法。

US8190610B2
CLAIM 18
. The computer system (行うこと) of claim 17 , wherein : the at least one output data group is a plurality of output data groups .
JPH07141394A
CLAIM 7
【請求項7】 請求項1から請求項6のいずれかに記載 のデータベース分割管理方法において、最適ページアク セス数mを算出し、キーレンジ分割がある場合には、サ ブキーレンジ単位の格納ページ数s(=m/p)を算出 し、sページ単位でサブキーレンジ分割し、ディスクへ データ挿入を行うこと (computer system) を特徴とするデータベース分割管 理方法。

US8190610B2
CLAIM 19
. The computer system (行うこと) of claim 17 , wherein : corresponding intermediate data for a data group being identifiable to the data group includes the corresponding intermediate data being stored such that metadata associated with the manner of storage is identifiable to the data group .
JPH07141394A
CLAIM 7
【請求項7】 請求項1から請求項6のいずれかに記載 のデータベース分割管理方法において、最適ページアク セス数mを算出し、キーレンジ分割がある場合には、サ ブキーレンジ単位の格納ページ数s(=m/p)を算出 し、sページ単位でサブキーレンジ分割し、ディスクへ データ挿入を行うこと (computer system) を特徴とするデータベース分割管 理方法。

US8190610B2
CLAIM 20
. The computer system (行うこと) of claim 19 , wherein : corresponding intermediate data for a data group being identifiable to the data group includes the computer system being configured to store the corresponding intermediate data in association with data that is identifiable to the data group .
JPH07141394A
CLAIM 7
【請求項7】 請求項1から請求項6のいずれかに記載 のデータベース分割管理方法において、最適ページアク セス数mを算出し、キーレンジ分割がある場合には、サ ブキーレンジ単位の格納ページ数s(=m/p)を算出 し、sページ単位でサブキーレンジ分割し、ディスクへ データ挿入を行うこと (computer system) を特徴とするデータベース分割管 理方法。

US8190610B2
CLAIM 21
. The computer system (行うこと) of claim 17 , wherein : the at least one processor and memory being configured to process the intermediate data for each data group in a manner that is defined to correspond to that data group includes , for each data group , the at least one processor and memory being configured to employ an iterator that corresponds to that data group .
JPH07141394A
CLAIM 7
【請求項7】 請求項1から請求項6のいずれかに記載 のデータベース分割管理方法において、最適ページアク セス数mを算出し、キーレンジ分割がある場合には、サ ブキーレンジ単位の格納ページ数s(=m/p)を算出 し、sページ単位でサブキーレンジ分割し、ディスクへ データ挿入を行うこと (computer system) を特徴とするデータベース分割管 理方法。

US8190610B2
CLAIM 22
. The computer system (行うこと) of claim 21 , wherein : the intermediate data includes a plurality of grouped sets of key/value pairs ;

the at least one processor and memory are further operable to reduce the intermediate data via a plurality of reducers ;

for at least one of the reducers , the iterator corresponding to a particular data group , for that reducer , is configured to operate according to a different key (のキー) of a different schema than the iterator corresponding to another particular data group , for that reducer .
JPH07141394A
CLAIM 7
【請求項7】 請求項1から請求項6のいずれかに記載 のデータベース分割管理方法において、最適ページアク セス数mを算出し、キーレンジ分割がある場合には、サ ブキーレンジ単位の格納ページ数s(=m/p)を算出 し、sページ単位でサブキーレンジ分割し、ディスクへ データ挿入を行うこと (computer system) を特徴とするデータベース分割管 理方法。

JPH07141394A
CLAIM 10
【請求項10】 請求項8に記載のデータベース分割管 理方法において、BESノードに割当てるプロセッサ数 またはIOSノードに割当てるプロセッサ数またはディ スク数を追加する場合、オンライン中であれば、追加対 象となるプロセッサまたはディスクで管理されるデータ ベースの表のキー (different key) レンジ範囲を閉塞し、新たにプロセッ サあるいはディスクを割り当て、ロック情報,ディレク トリ情報の引き継ぎを行い、ノード振り分け制御に必要 なディクショナリ情報の書き換えを行い、その後、オン ライン中であれば、前記閉塞を解除することを特徴とす るデータベース分割管理方法。

US8190610B2
CLAIM 23
. The computer system (行うこと) of claim 17 , wherein : the intermediate data includes a plurality of grouped sets of key/value pairs ;

the at least one processor and memory are further operable to reduce the intermediate data via a plurality of reducers ;

and the at least one processor and memory are further operable to : partition the intermediate data into a plurality of partitions , to provide each key/value pair of the intermediate data to a separate one of the partitions ;

and provide the intermediate data of each partition to a separate one of the reducers .
JPH07141394A
CLAIM 7
【請求項7】 請求項1から請求項6のいずれかに記載 のデータベース分割管理方法において、最適ページアク セス数mを算出し、キーレンジ分割がある場合には、サ ブキーレンジ単位の格納ページ数s(=m/p)を算出 し、sページ単位でサブキーレンジ分割し、ディスクへ データ挿入を行うこと (computer system) を特徴とするデータベース分割管 理方法。

US8190610B2
CLAIM 24
. The computer system (行うこと) of claim 17 , wherein : the intermediate data includes a plurality of grouped sets of key/value pairs ;

the at least one processor and memory are further operable to reduce the intermediate data via a plurality of reducers ;

and the at least one processor and memory are further operable to : partition the intermediate data into a plurality of partitions , to provide at least some of the key/value pairs of the intermediate data to more than one of the partitions ;

and provide the intermediate data of each partition to a separate one of the reducers .
JPH07141394A
CLAIM 7
【請求項7】 請求項1から請求項6のいずれかに記載 のデータベース分割管理方法において、最適ページアク セス数mを算出し、キーレンジ分割がある場合には、サ ブキーレンジ単位の格納ページ数s(=m/p)を算出 し、sページ単位でサブキーレンジ分割し、ディスクへ データ挿入を行うこと (computer system) を特徴とするデータベース分割管 理方法。

US8190610B2
CLAIM 25
. The computer system (行うこと) of claim 24 , wherein : the at least one processor and memory are further operable to , in partitioning , to provide all of the key/value pairs of the intermediate data to all of the partitions .
JPH07141394A
CLAIM 7
【請求項7】 請求項1から請求項6のいずれかに記載 のデータベース分割管理方法において、最適ページアク セス数mを算出し、キーレンジ分割がある場合には、サ ブキーレンジ単位の格納ページ数s(=m/p)を算出 し、sページ単位でサブキーレンジ分割し、ディスクへ データ挿入を行うこと (computer system) を特徴とするデータベース分割管 理方法。

US8190610B2
CLAIM 26
. The computer system (行うこと) of claim 24 , wherein : at least some of the reducers include a sort , group-by-key and combine task ;

the at least one processor and memory are further operable to generate and provide metadata for at least some of the mapping , partitioning , combining , grouping and sorting .
JPH07141394A
CLAIM 7
【請求項7】 請求項1から請求項6のいずれかに記載 のデータベース分割管理方法において、最適ページアク セス数mを算出し、キーレンジ分割がある場合には、サ ブキーレンジ単位の格納ページ数s(=m/p)を算出 し、sページ単位でサブキーレンジ分割し、ディスクへ データ挿入を行うこと (computer system) を特徴とするデータベース分割管 理方法。

US8190610B2
CLAIM 27
. The computer system (行うこと) of claim 26 , wherein : the reducing includes processing the metadata .
JPH07141394A
CLAIM 7
【請求項7】 請求項1から請求項6のいずれかに記載 のデータベース分割管理方法において、最適ページアク セス数mを算出し、キーレンジ分割がある場合には、サ ブキーレンジ単位の格納ページ数s(=m/p)を算出 し、sページ単位でサブキーレンジ分割し、ディスクへ データ挿入を行うこと (computer system) を特徴とするデータベース分割管 理方法。

US8190610B2
CLAIM 28
. The computer system (行うこと) of claim 27 , wherein : the at least one processor and memory being further operable to process the intermediate data for each data group in a manner that is defined to correspond to that data group includes , for each data group , the at least one processor and memory being further operable to employ an iterator that corresponds to that data group , wherein the iterator is configured to provide the associated metadata to the reducing .
JPH07141394A
CLAIM 7
【請求項7】 請求項1から請求項6のいずれかに記載 のデータベース分割管理方法において、最適ページアク セス数mを算出し、キーレンジ分割がある場合には、サ ブキーレンジ単位の格納ページ数s(=m/p)を算出 し、sページ単位でサブキーレンジ分割し、ディスクへ データ挿入を行うこと (computer system) を特徴とするデータベース分割管 理方法。

US8190610B2
CLAIM 29
. The computer system (行うこと) of claim 17 , wherein : the intermediate data processing of the reducing further comprises processing data that is not intermediate data .
JPH07141394A
CLAIM 7
【請求項7】 請求項1から請求項6のいずれかに記載 のデータベース分割管理方法において、最適ページアク セス数mを算出し、キーレンジ分割がある場合には、サ ブキーレンジ単位の格納ページ数s(=m/p)を算出 し、sページ単位でサブキーレンジ分割し、ディスクへ データ挿入を行うこと (computer system) を特徴とするデータベース分割管 理方法。

US8190610B2
CLAIM 30
. The computer system (行うこと) of claim 29 , wherein : the at least one processor and memory are further operable to reduce the intermediate data via a plurality of reducers ;

and the data that is not intermediate data , for a particular reducer , includes data that is associated with another reducer .
JPH07141394A
CLAIM 7
【請求項7】 請求項1から請求項6のいずれかに記載 のデータベース分割管理方法において、最適ページアク セス数mを算出し、キーレンジ分割がある場合には、サ ブキーレンジ単位の格納ページ数s(=m/p)を算出 し、sページ単位でサブキーレンジ分割し、ディスクへ データ挿入を行うこと (computer system) を特徴とするデータベース分割管 理方法。

US8190610B2
CLAIM 31
. The computer system (行うこと) of claim 29 , wherein : the at least one processor and memory are further operable to reduce the intermediate data via a plurality of reducers ;

and the data that is not intermediate data , for a particular reducer , includes data that is associated with that reducer .
JPH07141394A
CLAIM 7
【請求項7】 請求項1から請求項6のいずれかに記載 のデータベース分割管理方法において、最適ページアク セス数mを算出し、キーレンジ分割がある場合には、サ ブキーレンジ単位の格納ページ数s(=m/p)を算出 し、sページ単位でサブキーレンジ分割し、ディスクへ データ挿入を行うこと (computer system) を特徴とするデータベース分割管 理方法。

US8190610B2
CLAIM 32
. The computer system (行うこと) of claim 17 , wherein the at least one processor and memory are further operable to reduce by relating the data among the plurality of data groups .
JPH07141394A
CLAIM 7
【請求項7】 請求項1から請求項6のいずれかに記載 のデータベース分割管理方法において、最適ページアク セス数mを算出し、キーレンジ分割がある場合には、サ ブキーレンジ単位の格納ページ数s(=m/p)を算出 し、sページ単位でサブキーレンジ分割し、ディスクへ データ挿入を行うこと (computer system) を特徴とするデータベース分割管 理方法。

US8190610B2
CLAIM 33
. A map-reduce method of processing data from a plurality of groups having different schema over a computer system (行うこと) , the method comprising : for a first data set having a plurality of first key-value pairs , wherein such first data set belongs to a first data group and the first key-value pairs have a first schema , partitioning the first data set into a plurality of data partitions and providing each data partition to a selected one of a plurality of mapping functions that are each user-configurable to independently output a plurality of lists of values for each of a set of keys found in such map function' ;

s corresponding data partition to form a first intermediate data set having a first set of resulting key-value pairs ;

for a second data set having a plurality of second key-value pairs , wherein such second data set belongs to a second data group and the second key-value pairs have a second schema , partitioning the first data set into a plurality of data partitions and providing each data partition to a selected one of a plurality of mapping functions to form a second intermediate data set having a second set of resulting key-value pairs differing from the first set of resulting key-value pairs , wherein the first schema differs from the second schema , wherein the first and second schema and the first and second set of resulting key-value pairs have a key in common ;

and reducing the first and the second intermediate data set together so as to form an output data set , wherein the reducing is accomplished by iterating on at least one key from the first and second intermediate data set , and the output data set has a different schema than the first and second schema , wherein the mapping and reducing operations are performed by a distributed system .
JPH07141394A
CLAIM 7
【請求項7】 請求項1から請求項6のいずれかに記載 のデータベース分割管理方法において、最適ページアク セス数mを算出し、キーレンジ分割がある場合には、サ ブキーレンジ単位の格納ページ数s(=m/p)を算出 し、sページ単位でサブキーレンジ分割し、ディスクへ データ挿入を行うこと (computer system) を特徴とするデータベース分割管 理方法。

US8190610B2
CLAIM 40
. A computer system (行うこと) including a plurality of computing devices , the computer system configured to process data of a data set , wherein the data set comprises a plurality of data groups , the computer system comprises at least one processor and memory that are operable to perform the following operations : for a first data set having a plurality of first key-value pairs , wherein such first data set belongs to a first data group and the first key-value pairs have a first schema , partitioning the first data set into a plurality of data partitions and providing each data partition to a selected one of a plurality of mapping functions that are each user-configurable to independently output a plurality of lists of values for each of a set of keys found in such map function' ;

s corresponding data partition to form a first intermediate data set having a first set of resulting key-value pairs ;

for a second data set having a plurality of second key-value pairs , wherein such second data set belongs to a second data group and the second key-value pairs have a second schema , partitioning the first data set into a plurality of data partitions and providing each data partition to a selected one of a plurality of mapping functions to form a second intermediate data set having a second set of resulting key-value pairs differing from the first set of resulting key-value pairs , wherein the first schema differs from the second schema ;

and reducing the first and the second intermediate data set together so as to form an output data set , wherein the reducing is accomplished by iterating on at least one key from the first and second intermediate data set , and the output data set has a different schema than the first and second schema , wherein the first and second schema and the first and second set of resulting key-value pairs have a key in common .
JPH07141394A
CLAIM 7
【請求項7】 請求項1から請求項6のいずれかに記載 のデータベース分割管理方法において、最適ページアク セス数mを算出し、キーレンジ分割がある場合には、サ ブキーレンジ単位の格納ページ数s(=m/p)を算出 し、sページ単位でサブキーレンジ分割し、ディスクへ データ挿入を行うこと (computer system) を特徴とするデータベース分割管 理方法。

US8190610B2
CLAIM 41
. The computer system (行うこと) of claim 40 , wherein the reducing is accomplished by iterating on at least one key from each of the first and second intermediate data set so that the output data set is a merging of a portion of the first and second intermediate data set .
JPH07141394A
CLAIM 7
【請求項7】 請求項1から請求項6のいずれかに記載 のデータベース分割管理方法において、最適ページアク セス数mを算出し、キーレンジ分割がある場合には、サ ブキーレンジ単位の格納ページ数s(=m/p)を算出 し、sページ単位でサブキーレンジ分割し、ディスクへ データ挿入を行うこと (computer system) を特徴とするデータベース分割管 理方法。

US8190610B2
CLAIM 42
. The computer system (行うこと) of claim 40 , wherein the reducing is accomplished by a plurality of reducers , the d the at least one processor and memory further operable for : partitioning the first or second intermediate data set into a plurality of partitions , at least some of the key-value pairs of the first or second intermediate data being provided to more than one of the partitions ;

and providing the partitioned intermediate data set of each partition to a separate one of the reducers .
JPH07141394A
CLAIM 7
【請求項7】 請求項1から請求項6のいずれかに記載 のデータベース分割管理方法において、最適ページアク セス数mを算出し、キーレンジ分割がある場合には、サ ブキーレンジ単位の格納ページ数s(=m/p)を算出 し、sページ単位でサブキーレンジ分割し、ディスクへ データ挿入を行うこと (computer system) を特徴とするデータベース分割管 理方法。

US8190610B2
CLAIM 43
. The computer system (行うこと) of claim 42 , wherein all of the key-value pairs of the first and second intermediate data set are provided to all of the reducers .
JPH07141394A
CLAIM 7
【請求項7】 請求項1から請求項6のいずれかに記載 のデータベース分割管理方法において、最適ページアク セス数mを算出し、キーレンジ分割がある場合には、サ ブキーレンジ単位の格納ページ数s(=m/p)を算出 し、sページ単位でサブキーレンジ分割し、ディスクへ データ挿入を行うこと (computer system) を特徴とするデータベース分割管 理方法。

US8190610B2
CLAIM 44
. The computer system (行うこと) of claim 42 , wherein at least some of the reducers include a sort , group-by-key and combine task , the d the at least one processor and memory are further operable for : generating and providing metadata for at least some of the mapping , partitioning , combining , grouping and sorting .
JPH07141394A
CLAIM 7
【請求項7】 請求項1から請求項6のいずれかに記載 のデータベース分割管理方法において、最適ページアク セス数mを算出し、キーレンジ分割がある場合には、サ ブキーレンジ単位の格納ページ数s(=m/p)を算出 し、sページ単位でサブキーレンジ分割し、ディスクへ データ挿入を行うこと (computer system) を特徴とするデータベース分割管 理方法。

US8190610B2
CLAIM 45
. The computer system (行うこと) of claim 44 , wherein the reducing includes processing the metadata .
JPH07141394A
CLAIM 7
【請求項7】 請求項1から請求項6のいずれかに記載 のデータベース分割管理方法において、最適ページアク セス数mを算出し、キーレンジ分割がある場合には、サ ブキーレンジ単位の格納ページ数s(=m/p)を算出 し、sページ単位でサブキーレンジ分割し、ディスクへ データ挿入を行うこと (computer system) を特徴とするデータベース分割管 理方法。

US8190610B2
CLAIM 46
. The computer system (行うこと) of claim 45 , wherein iterating includes providing the associated metadata to the processing of the reducing step .
JPH07141394A
CLAIM 7
【請求項7】 請求項1から請求項6のいずれかに記載 のデータベース分割管理方法において、最適ページアク セス数mを算出し、キーレンジ分割がある場合には、サ ブキーレンジ単位の格納ページ数s(=m/p)を算出 し、sページ単位でサブキーレンジ分割し、ディスクへ データ挿入を行うこと (computer system) を特徴とするデータベース分割管 理方法。




US8190610B2

Filed: 2006-10-05     Issued: 2012-05-29

MapReduce for distributed database processing

(Original Assignee) Yahoo Inc     (Current Assignee) R2 Solutions LLC ; Altaba Inc

Ali Dasdan, Hung-Chih Yang, Ruey-Lung Hsiao
JPH0698770A

Filed: 1993-06-30     Issued: 1994-04-12

トークン列データベースにおけるトークンシーケンスの探索

(Original Assignee) Internatl Business Mach Corp <Ibm>; インターナショナル・ビジネス・マシーンズ・コーポレイション     

Andrea Califano, アンドレア・カリファノ
US8190610B2
CLAIM 1
. A method of processing data (コンピュ) of a data set over a distributed system , wherein the data set comprises a plurality of data groups , the method comprising : partitioning the data of each one of the data groups into a plurality of data partitions that each have a plurality of key-value pairs and providing each data partition to a selected one of a plurality of mapping functions that are each user-configurable to independently output a plurality of lists of values for each of a set of keys found in such map function' ;

s corresponding data partition to form corresponding intermediate data for that data group and identifiable to that data group , wherein the data of a first data group has a different schema than the data of a second data group and the data of the first data group is mapped differently than the data of the second data group so that different lists of values are output for the corresponding different intermediate data , wherein the different schema and corresponding different intermediate data have a key in common ;

and reducing the intermediate data for the data groups to at least one output data group , including processing the intermediate data for each data group in a manner that is defined to correspond to that data group , so as to result in a merging of the corresponding different intermediate data based on the key in common , wherein the mapping and reducing operations are performed by a distributed system .
JPH0698770A
CLAIM 8
【請求項8】データベース内で1以上のオリジナルなト ークン列のトークンの参照列を認識およびアクセスする コンピュ (processing data) ータ・システムにおいて、 オリジナルなトークン列の集合を有するデータベース と; a.各オリジナル・トークン列を2以上の隣接するオリ ジナル部分列に区切ることと、 b.オリジナル列の2以上の隣接しないオリジナル部分 列を追加することにより各オリジナル列に関連した1以 上のオリジナル・タップルを形成すること、によって上 記データベース内の各オリジナル・トークン列に対する 1以上のオリジナル・タップルを作る手段と;オリジナ ル列から作った各オリジナル・タップルに対する固有の オリジナル・インデックスで、しかもオリジナル・タッ プルを作ったオリジナル列に関連した固有オリジナル・ インデックスと;上記オリジナル・タップルを作ったオ リジナル列に関連した情報を有するセルで、かつ上記オ リジナル・インデックスによってアクセスされるセルを 有する第1メモリ・ルックアップ構造と; c.トークンの参照列を2以上の隣接したトークンの参 照部分列に区切ることと、 d.2以上の隣接しない参照部分列を追加することによ り1以上の参照タップルを形成すること、によってトー クンの参照列から作った1以上の参照タップルと;オリ ジナル・インデックスを作った方法と同じ方法で作られ た各参照タップルに対する固有の参照インデックスで、 1以上の参照インデックスを、1以上のオリジナル・イ ンデックスと比較した固有の参照インデックスと;上記 参照インデックスと上記オリジナル・インデックス間の 照合を探知するための第2メモリ・ルックアップ構造 と、上記1以上の参照インデックスと上記1以上のオリ ジナル・インデックス間の照合数に基づき、データベー ス内のオリジナル・トークン列を選択する手段;等から 構成したことを特徴とするオリジナルなトークン列のト ークンの参照列を認識およびアクセスするコンピュータ ・システム。

US8190610B2
CLAIM 13
. The method of claim 1 , wherein : the intermediate data processing step of the reducing step further comprises processing data (コンピュ) that is not intermediate data .
JPH0698770A
CLAIM 8
【請求項8】データベース内で1以上のオリジナルなト ークン列のトークンの参照列を認識およびアクセスする コンピュ (processing data) ータ・システムにおいて、 オリジナルなトークン列の集合を有するデータベース と; a.各オリジナル・トークン列を2以上の隣接するオリ ジナル部分列に区切ることと、 b.オリジナル列の2以上の隣接しないオリジナル部分 列を追加することにより各オリジナル列に関連した1以 上のオリジナル・タップルを形成すること、によって上 記データベース内の各オリジナル・トークン列に対する 1以上のオリジナル・タップルを作る手段と;オリジナ ル列から作った各オリジナル・タップルに対する固有の オリジナル・インデックスで、しかもオリジナル・タッ プルを作ったオリジナル列に関連した固有オリジナル・ インデックスと;上記オリジナル・タップルを作ったオ リジナル列に関連した情報を有するセルで、かつ上記オ リジナル・インデックスによってアクセスされるセルを 有する第1メモリ・ルックアップ構造と; c.トークンの参照列を2以上の隣接したトークンの参 照部分列に区切ることと、 d.2以上の隣接しない参照部分列を追加することによ り1以上の参照タップルを形成すること、によってトー クンの参照列から作った1以上の参照タップルと;オリ ジナル・インデックスを作った方法と同じ方法で作られ た各参照タップルに対する固有の参照インデックスで、 1以上の参照インデックスを、1以上のオリジナル・イ ンデックスと比較した固有の参照インデックスと;上記 参照インデックスと上記オリジナル・インデックス間の 照合を探知するための第2メモリ・ルックアップ構造 と、上記1以上の参照インデックスと上記1以上のオリ ジナル・インデックス間の照合数に基づき、データベー ス内のオリジナル・トークン列を選択する手段;等から 構成したことを特徴とするオリジナルなトークン列のト ークンの参照列を認識およびアクセスするコンピュータ ・システム。

US8190610B2
CLAIM 29
. The computer system of claim 17 , wherein : the intermediate data processing of the reducing further comprises processing data (コンピュ) that is not intermediate data .
JPH0698770A
CLAIM 8
【請求項8】データベース内で1以上のオリジナルなト ークン列のトークンの参照列を認識およびアクセスする コンピュ (processing data) ータ・システムにおいて、 オリジナルなトークン列の集合を有するデータベース と; a.各オリジナル・トークン列を2以上の隣接するオリ ジナル部分列に区切ることと、 b.オリジナル列の2以上の隣接しないオリジナル部分 列を追加することにより各オリジナル列に関連した1以 上のオリジナル・タップルを形成すること、によって上 記データベース内の各オリジナル・トークン列に対する 1以上のオリジナル・タップルを作る手段と;オリジナ ル列から作った各オリジナル・タップルに対する固有の オリジナル・インデックスで、しかもオリジナル・タッ プルを作ったオリジナル列に関連した固有オリジナル・ インデックスと;上記オリジナル・タップルを作ったオ リジナル列に関連した情報を有するセルで、かつ上記オ リジナル・インデックスによってアクセスされるセルを 有する第1メモリ・ルックアップ構造と; c.トークンの参照列を2以上の隣接したトークンの参 照部分列に区切ることと、 d.2以上の隣接しない参照部分列を追加することによ り1以上の参照タップルを形成すること、によってトー クンの参照列から作った1以上の参照タップルと;オリ ジナル・インデックスを作った方法と同じ方法で作られ た各参照タップルに対する固有の参照インデックスで、 1以上の参照インデックスを、1以上のオリジナル・イ ンデックスと比較した固有の参照インデックスと;上記 参照インデックスと上記オリジナル・インデックス間の 照合を探知するための第2メモリ・ルックアップ構造 と、上記1以上の参照インデックスと上記1以上のオリ ジナル・インデックス間の照合数に基づき、データベー ス内のオリジナル・トークン列を選択する手段;等から 構成したことを特徴とするオリジナルなトークン列のト ークンの参照列を認識およびアクセスするコンピュータ ・システム。

US8190610B2
CLAIM 33
. A map-reduce method of processing data (コンピュ) from a plurality of groups having different schema over a computer system , the method comprising : for a first data set having a plurality of first key-value pairs , wherein such first data set belongs to a first data group and the first key-value pairs have a first schema , partitioning the first data set into a plurality of data partitions and providing each data partition to a selected one of a plurality of mapping functions that are each user-configurable to independently output a plurality of lists of values for each of a set of keys found in such map function' ;

s corresponding data partition to form a first intermediate data set having a first set of resulting key-value pairs ;

for a second data set having a plurality of second key-value pairs , wherein such second data set belongs to a second data group and the second key-value pairs have a second schema , partitioning the first data set into a plurality of data partitions and providing each data partition to a selected one of a plurality of mapping functions to form a second intermediate data set having a second set (有する第1) of resulting key-value pairs differing from the first set of resulting key-value pairs , wherein the first schema differs from the second schema , wherein the first and second schema and the first and second set of resulting key-value pairs have a key in common ;

and reducing the first and the second intermediate data set together so as to form an output data set , wherein the reducing is accomplished by iterating on at least one key from the first and second intermediate data set , and the output data set has a different schema than the first and second schema , wherein the mapping and reducing operations are performed by a distributed system .
JPH0698770A
CLAIM 8
【請求項8】データベース内で1以上のオリジナルなト ークン列のトークンの参照列を認識およびアクセスする コンピュ (processing data) ータ・システムにおいて、 オリジナルなトークン列の集合を有するデータベース と; a.各オリジナル・トークン列を2以上の隣接するオリ ジナル部分列に区切ることと、 b.オリジナル列の2以上の隣接しないオリジナル部分 列を追加することにより各オリジナル列に関連した1以 上のオリジナル・タップルを形成すること、によって上 記データベース内の各オリジナル・トークン列に対する 1以上のオリジナル・タップルを作る手段と;オリジナ ル列から作った各オリジナル・タップルに対する固有の オリジナル・インデックスで、しかもオリジナル・タッ プルを作ったオリジナル列に関連した固有オリジナル・ インデックスと;上記オリジナル・タップルを作ったオ リジナル列に関連した情報を有するセルで、かつ上記オ リジナル・インデックスによってアクセスされるセルを 有する第1 (second set) メモリ・ルックアップ構造と; c.トークンの参照列を2以上の隣接したトークンの参 照部分列に区切ることと、 d.2以上の隣接しない参照部分列を追加することによ り1以上の参照タップルを形成すること、によってトー クンの参照列から作った1以上の参照タップルと;オリ ジナル・インデックスを作った方法と同じ方法で作られ た各参照タップルに対する固有の参照インデックスで、 1以上の参照インデックスを、1以上のオリジナル・イ ンデックスと比較した固有の参照インデックスと;上記 参照インデックスと上記オリジナル・インデックス間の 照合を探知するための第2メモリ・ルックアップ構造 と、上記1以上の参照インデックスと上記1以上のオリ ジナル・インデックス間の照合数に基づき、データベー ス内のオリジナル・トークン列を選択する手段;等から 構成したことを特徴とするオリジナルなトークン列のト ークンの参照列を認識およびアクセスするコンピュータ ・システム。

US8190610B2
CLAIM 40
. A computer system including a plurality of computing devices , the computer system configured to process data of a data set , wherein the data set comprises a plurality of data groups , the computer system comprises at least one processor and memory that are operable to perform the following operations : for a first data set having a plurality of first key-value pairs , wherein such first data set belongs to a first data group and the first key-value pairs have a first schema , partitioning the first data set into a plurality of data partitions and providing each data partition to a selected one of a plurality of mapping functions that are each user-configurable to independently output a plurality of lists of values for each of a set of keys found in such map function' ;

s corresponding data partition to form a first intermediate data set having a first set of resulting key-value pairs ;

for a second data set having a plurality of second key-value pairs , wherein such second data set belongs to a second data group and the second key-value pairs have a second schema , partitioning the first data set into a plurality of data partitions and providing each data partition to a selected one of a plurality of mapping functions to form a second intermediate data set having a second set (有する第1) of resulting key-value pairs differing from the first set of resulting key-value pairs , wherein the first schema differs from the second schema ;

and reducing the first and the second intermediate data set together so as to form an output data set , wherein the reducing is accomplished by iterating on at least one key from the first and second intermediate data set , and the output data set has a different schema than the first and second schema , wherein the first and second schema and the first and second set of resulting key-value pairs have a key in common .
JPH0698770A
CLAIM 8
【請求項8】データベース内で1以上のオリジナルなト ークン列のトークンの参照列を認識およびアクセスする コンピュータ・システムにおいて、 オリジナルなトークン列の集合を有するデータベース と; a.各オリジナル・トークン列を2以上の隣接するオリ ジナル部分列に区切ることと、 b.オリジナル列の2以上の隣接しないオリジナル部分 列を追加することにより各オリジナル列に関連した1以 上のオリジナル・タップルを形成すること、によって上 記データベース内の各オリジナル・トークン列に対する 1以上のオリジナル・タップルを作る手段と;オリジナ ル列から作った各オリジナル・タップルに対する固有の オリジナル・インデックスで、しかもオリジナル・タッ プルを作ったオリジナル列に関連した固有オリジナル・ インデックスと;上記オリジナル・タップルを作ったオ リジナル列に関連した情報を有するセルで、かつ上記オ リジナル・インデックスによってアクセスされるセルを 有する第1 (second set) メモリ・ルックアップ構造と; c.トークンの参照列を2以上の隣接したトークンの参 照部分列に区切ることと、 d.2以上の隣接しない参照部分列を追加することによ り1以上の参照タップルを形成すること、によってトー クンの参照列から作った1以上の参照タップルと;オリ ジナル・インデックスを作った方法と同じ方法で作られ た各参照タップルに対する固有の参照インデックスで、 1以上の参照インデックスを、1以上のオリジナル・イ ンデックスと比較した固有の参照インデックスと;上記 参照インデックスと上記オリジナル・インデックス間の 照合を探知するための第2メモリ・ルックアップ構造 と、上記1以上の参照インデックスと上記1以上のオリ ジナル・インデックス間の照合数に基づき、データベー ス内のオリジナル・トークン列を選択する手段;等から 構成したことを特徴とするオリジナルなトークン列のト ークンの参照列を認識およびアクセスするコンピュータ ・システム。




US8190610B2

Filed: 2006-10-05     Issued: 2012-05-29

MapReduce for distributed database processing

(Original Assignee) Yahoo Inc     (Current Assignee) R2 Solutions LLC ; Altaba Inc

Ali Dasdan, Hung-Chih Yang, Ruey-Lung Hsiao
EP0583559A1

Filed: 1993-05-24     Issued: 1994-02-23

Finding token sequences in a database of token strings

(Original Assignee) International Business Machines Corp     (Current Assignee) International Business Machines Corp

Andrea Califano
US8190610B2
CLAIM 1
. A method of processing data of a data set (hash table) over a distributed system , wherein the data set comprises a plurality of data groups , the method comprising : partitioning the data of each one of the data groups into a plurality of data partitions that each have a plurality of key-value pairs and providing each data partition to a selected one of a plurality of mapping functions that are each user-configurable to independently output a plurality of lists of values for each of a set of keys found in such map function' ;

s corresponding data partition to form corresponding intermediate data for that data group and identifiable to that data group , wherein the data of a first data (similar manner) group has a different schema than the data of a second data group and the data of the first data group is mapped differently than the data of the second data group so that different lists of values are output for the corresponding different intermediate data , wherein the different schema and corresponding different intermediate data have a key in common ;

and reducing the intermediate data for the data groups to at least one output data group , including processing the intermediate data for each data group in a manner that is defined to correspond to that data group , so as to result in a merging of the corresponding different intermediate data based on the key in common , wherein the mapping and reducing operations (different lengths) are performed by a distributed system .
EP0583559A1
CLAIM 3
A method of finding a reference string of tokens in one ore more original token strings within a database , as in claim 1 , where the original substrings of tokens are of different lengths (reducing operations) .

EP0583559A1
CLAIM 10
A method for recognizing and accessing a reference string of tokens in one or more original token strings within a database comprising the steps of : creating one or more original tuples for each of the original token strings in the database by : a . partitioning each original token string into two or more original substrings of contiguous tokens ;
b . appending together at least two original non-contiguous original substrings of the original string to form at least one original tuple associated with each original string ;
creating a unique original index for each original tuple created from an original string , the original index being associated with the original string from which the original tuple was created ;
using the original index to point to a cell in a first memory look-up structure and storing in the cell an information record associated with an original string ;
creating one or more reference tuples from the reference string of tokens by : c . partitioning the reference string of tokens into two or more reference substrings of contiguous tokens ;
d . appending together at least two non contiguous reference substrings to form at least one reference tuple ;
creating a unique reference index for each reference tuple in a similar manner (first data) to which the original index was created ;
comparing at least one reference index to at least one original index using the memory look-up structure ;
tracking the matches between the reference index and original index ;
storing the tracking results in a second memory look-up structure ;
selecting an original token string in the database based on the number of matches between one or more original indexes and one or more reference indexes .

EP0583559A1
CLAIM 20
A method for recognizing and accessing a reference string of tokens in one or more original token strings within a database , as in claim 10 , where the first look-up structure is a data structure that includes structures like a vector , array , and hash table (data set, value pairs, output data groups, output data set) .

US8190610B2
CLAIM 2
. The method of claim 1 , wherein : the at least one output data group is a plurality of output data groups (hash table) .
EP0583559A1
CLAIM 20
A method for recognizing and accessing a reference string of tokens in one or more original token strings within a database , as in claim 10 , where the first look-up structure is a data structure that includes structures like a vector , array , and hash table (data set, value pairs, output data groups, output data set) .

US8190610B2
CLAIM 14
. The method of claim 13 , wherein : the reducing step is carried out by a plurality of reducers ;

and the data that is not intermediate data , for a particular reducer , includes data (contiguous amino acids) that is associated with another reducer .
EP0583559A1
CLAIM 27
A method for recognizing and accessing a reference string of amino acids in one or more original protein strings within a database comprising the steps of : creating one or more original tuples for each of the original protein strings in the database by : a . partitioning each original protein string into two or more substrings of contiguous amino acids (includes data) ;
b . forming at least one original tuple associated with each original protein string by appending together at least two non contiguous substrings of the original string ;
creating a unique original index for each original tuple created from an original protein string , the original index being associated with the original protein string from which the original tuple was created ;
storing the original index in a first memory look-up structure along with associated information ;
creating one or more reference tuples from the reference string of tokens by : c . partitioning the reference string of amino acids into two or more contiguous reference substrings of amino acids ;
d . forming at least one reference tuple by appending together at least two non contiguous reference substrings ;
creating a unique reference index for each reference tuple in a similar manner to which the original index was created ;
comparing at least one reference index to at least one original index ;
tracking the matches between the reference index and original index ;
storing the tracking results in a second memory look-up structure ;
selecting an original protein string in the database based on the number of matches between one or more original indexes and one or more reference indexes .

US8190610B2
CLAIM 15
. The method of claim 13 , wherein : the reducing step is carried out by a plurality of reducers ;

and the data that is not intermediate data , for a particular reducer , includes data (contiguous amino acids) that is associated with that reducer .
EP0583559A1
CLAIM 27
A method for recognizing and accessing a reference string of amino acids in one or more original protein strings within a database comprising the steps of : creating one or more original tuples for each of the original protein strings in the database by : a . partitioning each original protein string into two or more substrings of contiguous amino acids (includes data) ;
b . forming at least one original tuple associated with each original protein string by appending together at least two non contiguous substrings of the original string ;
creating a unique original index for each original tuple created from an original protein string , the original index being associated with the original protein string from which the original tuple was created ;
storing the original index in a first memory look-up structure along with associated information ;
creating one or more reference tuples from the reference string of tokens by : c . partitioning the reference string of amino acids into two or more contiguous reference substrings of amino acids ;
d . forming at least one reference tuple by appending together at least two non contiguous reference substrings ;
creating a unique reference index for each reference tuple in a similar manner to which the original index was created ;
comparing at least one reference index to at least one original index ;
tracking the matches between the reference index and original index ;
storing the tracking results in a second memory look-up structure ;
selecting an original protein string in the database based on the number of matches between one or more original indexes and one or more reference indexes .

US8190610B2
CLAIM 17
. A computer system (computer system) including a plurality of computing devices , the computer system configured to process data of a data set (hash table) , wherein the data set comprises a plurality of data groups , the computer system comprises at least one processor and memory that are operable to perform the following operations : partitioning the data of each one of the data groups into a plurality of data partitions that each have a plurality of key-value pairs and providing each data partition to a selected one of a plurality of mapping functions that are each user-configurable to independently output a plurality of lists of values for each of a set of keys found in such map function' ;

s corresponding data partition to form corresponding intermediate data for that data group and identifiable to that data group , wherein the data of a first data (similar manner) group has a different schema than the data of a second data group and the data of the first data group is mapped differently than the data of the second data group so that different lists of values are output for the corresponding different intermediate data , wherein the different schema and corresponding different intermediate data have a key in common ;

and reduce the intermediate data for the data groups to at least one output data group , including processing the intermediate data for each data group in a manner that is defined to correspond to that data group so as to result in a merging of the corresponding different intermediate data based on the key in common .
EP0583559A1
CLAIM 10
A method for recognizing and accessing a reference string of tokens in one or more original token strings within a database comprising the steps of : creating one or more original tuples for each of the original token strings in the database by : a . partitioning each original token string into two or more original substrings of contiguous tokens ;
b . appending together at least two original non-contiguous original substrings of the original string to form at least one original tuple associated with each original string ;
creating a unique original index for each original tuple created from an original string , the original index being associated with the original string from which the original tuple was created ;
using the original index to point to a cell in a first memory look-up structure and storing in the cell an information record associated with an original string ;
creating one or more reference tuples from the reference string of tokens by : c . partitioning the reference string of tokens into two or more reference substrings of contiguous tokens ;
d . appending together at least two non contiguous reference substrings to form at least one reference tuple ;
creating a unique reference index for each reference tuple in a similar manner (first data) to which the original index was created ;
comparing at least one reference index to at least one original index using the memory look-up structure ;
tracking the matches between the reference index and original index ;
storing the tracking results in a second memory look-up structure ;
selecting an original token string in the database based on the number of matches between one or more original indexes and one or more reference indexes .

EP0583559A1
CLAIM 20
A method for recognizing and accessing a reference string of tokens in one or more original token strings within a database , as in claim 10 , where the first look-up structure is a data structure that includes structures like a vector , array , and hash table (data set, value pairs, output data groups, output data set) .

EP0583559A1
CLAIM 31
A computer system (computer system) for recognizing and accessing a reference string of tokens in one or more original token strings within a database comprising : a database having a set of original token strings ;
a means for creating at least one original tuple for each of the original token strings in the database , the tuple formed by : a . partitioning each original token string into two or more contiguous original substrings of tokens ;
b . forming at least one original tuple associated with each original string by appending together at least two original non-contiguous substrings of the original string ;
a unique original index for each original tuple created from an original string , the original index being associated with the original string from which the original tuple was created ;
a first memory look-up structure with cells , the cells being accessed by the original index and containing information associated with the original string from which the original tuple was created ;
one or more reference tuples created from the reference string of tokens by : c . partitioning the reference string of tokens into two or more non contiguous reference substrings of tokens ;
d . forming at least one reference tuple by appending together at least two reference substrings ;
a unique reference index for each reference tuple created in a similar manner to which the original index was created , the reference index compared to at least one reference index to at least one original index ;
a second memory look-up structure for tracking matches between the reference index and original index , an original token string in the database being selected based on the number of matches between one or more original indexes and one or more reference indexes .

US8190610B2
CLAIM 18
. The computer system (computer system) of claim 17 , wherein : the at least one output data group is a plurality of output data groups (hash table) .
EP0583559A1
CLAIM 20
A method for recognizing and accessing a reference string of tokens in one or more original token strings within a database , as in claim 10 , where the first look-up structure is a data structure that includes structures like a vector , array , and hash table (data set, value pairs, output data groups, output data set) .

EP0583559A1
CLAIM 31
A computer system (computer system) for recognizing and accessing a reference string of tokens in one or more original token strings within a database comprising : a database having a set of original token strings ;
a means for creating at least one original tuple for each of the original token strings in the database , the tuple formed by : a . partitioning each original token string into two or more contiguous original substrings of tokens ;
b . forming at least one original tuple associated with each original string by appending together at least two original non-contiguous substrings of the original string ;
a unique original index for each original tuple created from an original string , the original index being associated with the original string from which the original tuple was created ;
a first memory look-up structure with cells , the cells being accessed by the original index and containing information associated with the original string from which the original tuple was created ;
one or more reference tuples created from the reference string of tokens by : c . partitioning the reference string of tokens into two or more non contiguous reference substrings of tokens ;
d . forming at least one reference tuple by appending together at least two reference substrings ;
a unique reference index for each reference tuple created in a similar manner to which the original index was created , the reference index compared to at least one reference index to at least one original index ;
a second memory look-up structure for tracking matches between the reference index and original index , an original token string in the database being selected based on the number of matches between one or more original indexes and one or more reference indexes .

US8190610B2
CLAIM 19
. The computer system (computer system) of claim 17 , wherein : corresponding intermediate data for a data group being identifiable to the data group includes the corresponding intermediate data being stored such that metadata associated with the manner of storage is identifiable to the data group .
EP0583559A1
CLAIM 31
A computer system (computer system) for recognizing and accessing a reference string of tokens in one or more original token strings within a database comprising : a database having a set of original token strings ;
a means for creating at least one original tuple for each of the original token strings in the database , the tuple formed by : a . partitioning each original token string into two or more contiguous original substrings of tokens ;
b . forming at least one original tuple associated with each original string by appending together at least two original non-contiguous substrings of the original string ;
a unique original index for each original tuple created from an original string , the original index being associated with the original string from which the original tuple was created ;
a first memory look-up structure with cells , the cells being accessed by the original index and containing information associated with the original string from which the original tuple was created ;
one or more reference tuples created from the reference string of tokens by : c . partitioning the reference string of tokens into two or more non contiguous reference substrings of tokens ;
d . forming at least one reference tuple by appending together at least two reference substrings ;
a unique reference index for each reference tuple created in a similar manner to which the original index was created , the reference index compared to at least one reference index to at least one original index ;
a second memory look-up structure for tracking matches between the reference index and original index , an original token string in the database being selected based on the number of matches between one or more original indexes and one or more reference indexes .

US8190610B2
CLAIM 20
. The computer system (computer system) of claim 19 , wherein : corresponding intermediate data for a data group being identifiable to the data group includes the computer system being configured to store the corresponding intermediate data in association with data that is identifiable to the data group .
EP0583559A1
CLAIM 31
A computer system (computer system) for recognizing and accessing a reference string of tokens in one or more original token strings within a database comprising : a database having a set of original token strings ;
a means for creating at least one original tuple for each of the original token strings in the database , the tuple formed by : a . partitioning each original token string into two or more contiguous original substrings of tokens ;
b . forming at least one original tuple associated with each original string by appending together at least two original non-contiguous substrings of the original string ;
a unique original index for each original tuple created from an original string , the original index being associated with the original string from which the original tuple was created ;
a first memory look-up structure with cells , the cells being accessed by the original index and containing information associated with the original string from which the original tuple was created ;
one or more reference tuples created from the reference string of tokens by : c . partitioning the reference string of tokens into two or more non contiguous reference substrings of tokens ;
d . forming at least one reference tuple by appending together at least two reference substrings ;
a unique reference index for each reference tuple created in a similar manner to which the original index was created , the reference index compared to at least one reference index to at least one original index ;
a second memory look-up structure for tracking matches between the reference index and original index , an original token string in the database being selected based on the number of matches between one or more original indexes and one or more reference indexes .

US8190610B2
CLAIM 21
. The computer system (computer system) of claim 17 , wherein : the at least one processor and memory being configured to process the intermediate data for each data group in a manner that is defined to correspond to that data group includes , for each data group , the at least one processor and memory being configured to employ an iterator that corresponds to that data group .
EP0583559A1
CLAIM 31
A computer system (computer system) for recognizing and accessing a reference string of tokens in one or more original token strings within a database comprising : a database having a set of original token strings ;
a means for creating at least one original tuple for each of the original token strings in the database , the tuple formed by : a . partitioning each original token string into two or more contiguous original substrings of tokens ;
b . forming at least one original tuple associated with each original string by appending together at least two original non-contiguous substrings of the original string ;
a unique original index for each original tuple created from an original string , the original index being associated with the original string from which the original tuple was created ;
a first memory look-up structure with cells , the cells being accessed by the original index and containing information associated with the original string from which the original tuple was created ;
one or more reference tuples created from the reference string of tokens by : c . partitioning the reference string of tokens into two or more non contiguous reference substrings of tokens ;
d . forming at least one reference tuple by appending together at least two reference substrings ;
a unique reference index for each reference tuple created in a similar manner to which the original index was created , the reference index compared to at least one reference index to at least one original index ;
a second memory look-up structure for tracking matches between the reference index and original index , an original token string in the database being selected based on the number of matches between one or more original indexes and one or more reference indexes .

US8190610B2
CLAIM 22
. The computer system (computer system) of claim 21 , wherein : the intermediate data includes a plurality of grouped sets of key/value pairs ;

the at least one processor and memory are further operable to reduce the intermediate data via a plurality of reducers ;

for at least one of the reducers , the iterator corresponding to a particular data group , for that reducer , is configured to operate according to a different key of a different schema than the iterator corresponding to another particular data group , for that reducer .
EP0583559A1
CLAIM 31
A computer system (computer system) for recognizing and accessing a reference string of tokens in one or more original token strings within a database comprising : a database having a set of original token strings ;
a means for creating at least one original tuple for each of the original token strings in the database , the tuple formed by : a . partitioning each original token string into two or more contiguous original substrings of tokens ;
b . forming at least one original tuple associated with each original string by appending together at least two original non-contiguous substrings of the original string ;
a unique original index for each original tuple created from an original string , the original index being associated with the original string from which the original tuple was created ;
a first memory look-up structure with cells , the cells being accessed by the original index and containing information associated with the original string from which the original tuple was created ;
one or more reference tuples created from the reference string of tokens by : c . partitioning the reference string of tokens into two or more non contiguous reference substrings of tokens ;
d . forming at least one reference tuple by appending together at least two reference substrings ;
a unique reference index for each reference tuple created in a similar manner to which the original index was created , the reference index compared to at least one reference index to at least one original index ;
a second memory look-up structure for tracking matches between the reference index and original index , an original token string in the database being selected based on the number of matches between one or more original indexes and one or more reference indexes .

US8190610B2
CLAIM 23
. The computer system (computer system) of claim 17 , wherein : the intermediate data includes a plurality of grouped sets of key/value pairs ;

the at least one processor and memory are further operable to reduce the intermediate data via a plurality of reducers ;

and the at least one processor and memory are further operable to : partition the intermediate data into a plurality of partitions , to provide each key/value pair of the intermediate data to a separate one of the partitions ;

and provide the intermediate data of each partition to a separate one of the reducers .
EP0583559A1
CLAIM 31
A computer system (computer system) for recognizing and accessing a reference string of tokens in one or more original token strings within a database comprising : a database having a set of original token strings ;
a means for creating at least one original tuple for each of the original token strings in the database , the tuple formed by : a . partitioning each original token string into two or more contiguous original substrings of tokens ;
b . forming at least one original tuple associated with each original string by appending together at least two original non-contiguous substrings of the original string ;
a unique original index for each original tuple created from an original string , the original index being associated with the original string from which the original tuple was created ;
a first memory look-up structure with cells , the cells being accessed by the original index and containing information associated with the original string from which the original tuple was created ;
one or more reference tuples created from the reference string of tokens by : c . partitioning the reference string of tokens into two or more non contiguous reference substrings of tokens ;
d . forming at least one reference tuple by appending together at least two reference substrings ;
a unique reference index for each reference tuple created in a similar manner to which the original index was created , the reference index compared to at least one reference index to at least one original index ;
a second memory look-up structure for tracking matches between the reference index and original index , an original token string in the database being selected based on the number of matches between one or more original indexes and one or more reference indexes .

US8190610B2
CLAIM 24
. The computer system (computer system) of claim 17 , wherein : the intermediate data includes a plurality of grouped sets of key/value pairs ;

the at least one processor and memory are further operable to reduce the intermediate data via a plurality of reducers ;

and the at least one processor and memory are further operable to : partition the intermediate data into a plurality of partitions , to provide at least some of the key/value pairs of the intermediate data to more than one of the partitions ;

and provide the intermediate data of each partition to a separate one of the reducers .
EP0583559A1
CLAIM 31
A computer system (computer system) for recognizing and accessing a reference string of tokens in one or more original token strings within a database comprising : a database having a set of original token strings ;
a means for creating at least one original tuple for each of the original token strings in the database , the tuple formed by : a . partitioning each original token string into two or more contiguous original substrings of tokens ;
b . forming at least one original tuple associated with each original string by appending together at least two original non-contiguous substrings of the original string ;
a unique original index for each original tuple created from an original string , the original index being associated with the original string from which the original tuple was created ;
a first memory look-up structure with cells , the cells being accessed by the original index and containing information associated with the original string from which the original tuple was created ;
one or more reference tuples created from the reference string of tokens by : c . partitioning the reference string of tokens into two or more non contiguous reference substrings of tokens ;
d . forming at least one reference tuple by appending together at least two reference substrings ;
a unique reference index for each reference tuple created in a similar manner to which the original index was created , the reference index compared to at least one reference index to at least one original index ;
a second memory look-up structure for tracking matches between the reference index and original index , an original token string in the database being selected based on the number of matches between one or more original indexes and one or more reference indexes .

US8190610B2
CLAIM 25
. The computer system (computer system) of claim 24 , wherein : the at least one processor and memory are further operable to , in partitioning , to provide all of the key/value pairs of the intermediate data to all of the partitions .
EP0583559A1
CLAIM 31
A computer system (computer system) for recognizing and accessing a reference string of tokens in one or more original token strings within a database comprising : a database having a set of original token strings ;
a means for creating at least one original tuple for each of the original token strings in the database , the tuple formed by : a . partitioning each original token string into two or more contiguous original substrings of tokens ;
b . forming at least one original tuple associated with each original string by appending together at least two original non-contiguous substrings of the original string ;
a unique original index for each original tuple created from an original string , the original index being associated with the original string from which the original tuple was created ;
a first memory look-up structure with cells , the cells being accessed by the original index and containing information associated with the original string from which the original tuple was created ;
one or more reference tuples created from the reference string of tokens by : c . partitioning the reference string of tokens into two or more non contiguous reference substrings of tokens ;
d . forming at least one reference tuple by appending together at least two reference substrings ;
a unique reference index for each reference tuple created in a similar manner to which the original index was created , the reference index compared to at least one reference index to at least one original index ;
a second memory look-up structure for tracking matches between the reference index and original index , an original token string in the database being selected based on the number of matches between one or more original indexes and one or more reference indexes .

US8190610B2
CLAIM 26
. The computer system (computer system) of claim 24 , wherein : at least some of the reducers include a sort , group-by-key and combine task ;

the at least one processor and memory are further operable to generate and provide metadata for at least some of the mapping , partitioning , combining , grouping and sorting .
EP0583559A1
CLAIM 31
A computer system (computer system) for recognizing and accessing a reference string of tokens in one or more original token strings within a database comprising : a database having a set of original token strings ;
a means for creating at least one original tuple for each of the original token strings in the database , the tuple formed by : a . partitioning each original token string into two or more contiguous original substrings of tokens ;
b . forming at least one original tuple associated with each original string by appending together at least two original non-contiguous substrings of the original string ;
a unique original index for each original tuple created from an original string , the original index being associated with the original string from which the original tuple was created ;
a first memory look-up structure with cells , the cells being accessed by the original index and containing information associated with the original string from which the original tuple was created ;
one or more reference tuples created from the reference string of tokens by : c . partitioning the reference string of tokens into two or more non contiguous reference substrings of tokens ;
d . forming at least one reference tuple by appending together at least two reference substrings ;
a unique reference index for each reference tuple created in a similar manner to which the original index was created , the reference index compared to at least one reference index to at least one original index ;
a second memory look-up structure for tracking matches between the reference index and original index , an original token string in the database being selected based on the number of matches between one or more original indexes and one or more reference indexes .

US8190610B2
CLAIM 27
. The computer system (computer system) of claim 26 , wherein : the reducing includes processing the metadata .
EP0583559A1
CLAIM 31
A computer system (computer system) for recognizing and accessing a reference string of tokens in one or more original token strings within a database comprising : a database having a set of original token strings ;
a means for creating at least one original tuple for each of the original token strings in the database , the tuple formed by : a . partitioning each original token string into two or more contiguous original substrings of tokens ;
b . forming at least one original tuple associated with each original string by appending together at least two original non-contiguous substrings of the original string ;
a unique original index for each original tuple created from an original string , the original index being associated with the original string from which the original tuple was created ;
a first memory look-up structure with cells , the cells being accessed by the original index and containing information associated with the original string from which the original tuple was created ;
one or more reference tuples created from the reference string of tokens by : c . partitioning the reference string of tokens into two or more non contiguous reference substrings of tokens ;
d . forming at least one reference tuple by appending together at least two reference substrings ;
a unique reference index for each reference tuple created in a similar manner to which the original index was created , the reference index compared to at least one reference index to at least one original index ;
a second memory look-up structure for tracking matches between the reference index and original index , an original token string in the database being selected based on the number of matches between one or more original indexes and one or more reference indexes .

US8190610B2
CLAIM 28
. The computer system (computer system) of claim 27 , wherein : the at least one processor and memory being further operable to process the intermediate data for each data group in a manner that is defined to correspond to that data group includes , for each data group , the at least one processor and memory being further operable to employ an iterator that corresponds to that data group , wherein the iterator is configured to provide the associated metadata to the reducing .
EP0583559A1
CLAIM 31
A computer system (computer system) for recognizing and accessing a reference string of tokens in one or more original token strings within a database comprising : a database having a set of original token strings ;
a means for creating at least one original tuple for each of the original token strings in the database , the tuple formed by : a . partitioning each original token string into two or more contiguous original substrings of tokens ;
b . forming at least one original tuple associated with each original string by appending together at least two original non-contiguous substrings of the original string ;
a unique original index for each original tuple created from an original string , the original index being associated with the original string from which the original tuple was created ;
a first memory look-up structure with cells , the cells being accessed by the original index and containing information associated with the original string from which the original tuple was created ;
one or more reference tuples created from the reference string of tokens by : c . partitioning the reference string of tokens into two or more non contiguous reference substrings of tokens ;
d . forming at least one reference tuple by appending together at least two reference substrings ;
a unique reference index for each reference tuple created in a similar manner to which the original index was created , the reference index compared to at least one reference index to at least one original index ;
a second memory look-up structure for tracking matches between the reference index and original index , an original token string in the database being selected based on the number of matches between one or more original indexes and one or more reference indexes .

US8190610B2
CLAIM 29
. The computer system (computer system) of claim 17 , wherein : the intermediate data processing of the reducing further comprises processing data that is not intermediate data .
EP0583559A1
CLAIM 31
A computer system (computer system) for recognizing and accessing a reference string of tokens in one or more original token strings within a database comprising : a database having a set of original token strings ;
a means for creating at least one original tuple for each of the original token strings in the database , the tuple formed by : a . partitioning each original token string into two or more contiguous original substrings of tokens ;
b . forming at least one original tuple associated with each original string by appending together at least two original non-contiguous substrings of the original string ;
a unique original index for each original tuple created from an original string , the original index being associated with the original string from which the original tuple was created ;
a first memory look-up structure with cells , the cells being accessed by the original index and containing information associated with the original string from which the original tuple was created ;
one or more reference tuples created from the reference string of tokens by : c . partitioning the reference string of tokens into two or more non contiguous reference substrings of tokens ;
d . forming at least one reference tuple by appending together at least two reference substrings ;
a unique reference index for each reference tuple created in a similar manner to which the original index was created , the reference index compared to at least one reference index to at least one original index ;
a second memory look-up structure for tracking matches between the reference index and original index , an original token string in the database being selected based on the number of matches between one or more original indexes and one or more reference indexes .

US8190610B2
CLAIM 30
. The computer system (computer system) of claim 29 , wherein : the at least one processor and memory are further operable to reduce the intermediate data via a plurality of reducers ;

and the data that is not intermediate data , for a particular reducer , includes data (contiguous amino acids) that is associated with another reducer .
EP0583559A1
CLAIM 27
A method for recognizing and accessing a reference string of amino acids in one or more original protein strings within a database comprising the steps of : creating one or more original tuples for each of the original protein strings in the database by : a . partitioning each original protein string into two or more substrings of contiguous amino acids (includes data) ;
b . forming at least one original tuple associated with each original protein string by appending together at least two non contiguous substrings of the original string ;
creating a unique original index for each original tuple created from an original protein string , the original index being associated with the original protein string from which the original tuple was created ;
storing the original index in a first memory look-up structure along with associated information ;
creating one or more reference tuples from the reference string of tokens by : c . partitioning the reference string of amino acids into two or more contiguous reference substrings of amino acids ;
d . forming at least one reference tuple by appending together at least two non contiguous reference substrings ;
creating a unique reference index for each reference tuple in a similar manner to which the original index was created ;
comparing at least one reference index to at least one original index ;
tracking the matches between the reference index and original index ;
storing the tracking results in a second memory look-up structure ;
selecting an original protein string in the database based on the number of matches between one or more original indexes and one or more reference indexes .

EP0583559A1
CLAIM 31
A computer system (computer system) for recognizing and accessing a reference string of tokens in one or more original token strings within a database comprising : a database having a set of original token strings ;
a means for creating at least one original tuple for each of the original token strings in the database , the tuple formed by : a . partitioning each original token string into two or more contiguous original substrings of tokens ;
b . forming at least one original tuple associated with each original string by appending together at least two original non-contiguous substrings of the original string ;
a unique original index for each original tuple created from an original string , the original index being associated with the original string from which the original tuple was created ;
a first memory look-up structure with cells , the cells being accessed by the original index and containing information associated with the original string from which the original tuple was created ;
one or more reference tuples created from the reference string of tokens by : c . partitioning the reference string of tokens into two or more non contiguous reference substrings of tokens ;
d . forming at least one reference tuple by appending together at least two reference substrings ;
a unique reference index for each reference tuple created in a similar manner to which the original index was created , the reference index compared to at least one reference index to at least one original index ;
a second memory look-up structure for tracking matches between the reference index and original index , an original token string in the database being selected based on the number of matches between one or more original indexes and one or more reference indexes .

US8190610B2
CLAIM 31
. The computer system (computer system) of claim 29 , wherein : the at least one processor and memory are further operable to reduce the intermediate data via a plurality of reducers ;

and the data that is not intermediate data , for a particular reducer , includes data (contiguous amino acids) that is associated with that reducer .
EP0583559A1
CLAIM 27
A method for recognizing and accessing a reference string of amino acids in one or more original protein strings within a database comprising the steps of : creating one or more original tuples for each of the original protein strings in the database by : a . partitioning each original protein string into two or more substrings of contiguous amino acids (includes data) ;
b . forming at least one original tuple associated with each original protein string by appending together at least two non contiguous substrings of the original string ;
creating a unique original index for each original tuple created from an original protein string , the original index being associated with the original protein string from which the original tuple was created ;
storing the original index in a first memory look-up structure along with associated information ;
creating one or more reference tuples from the reference string of tokens by : c . partitioning the reference string of amino acids into two or more contiguous reference substrings of amino acids ;
d . forming at least one reference tuple by appending together at least two non contiguous reference substrings ;
creating a unique reference index for each reference tuple in a similar manner to which the original index was created ;
comparing at least one reference index to at least one original index ;
tracking the matches between the reference index and original index ;
storing the tracking results in a second memory look-up structure ;
selecting an original protein string in the database based on the number of matches between one or more original indexes and one or more reference indexes .

EP0583559A1
CLAIM 31
A computer system (computer system) for recognizing and accessing a reference string of tokens in one or more original token strings within a database comprising : a database having a set of original token strings ;
a means for creating at least one original tuple for each of the original token strings in the database , the tuple formed by : a . partitioning each original token string into two or more contiguous original substrings of tokens ;
b . forming at least one original tuple associated with each original string by appending together at least two original non-contiguous substrings of the original string ;
a unique original index for each original tuple created from an original string , the original index being associated with the original string from which the original tuple was created ;
a first memory look-up structure with cells , the cells being accessed by the original index and containing information associated with the original string from which the original tuple was created ;
one or more reference tuples created from the reference string of tokens by : c . partitioning the reference string of tokens into two or more non contiguous reference substrings of tokens ;
d . forming at least one reference tuple by appending together at least two reference substrings ;
a unique reference index for each reference tuple created in a similar manner to which the original index was created , the reference index compared to at least one reference index to at least one original index ;
a second memory look-up structure for tracking matches between the reference index and original index , an original token string in the database being selected based on the number of matches between one or more original indexes and one or more reference indexes .

US8190610B2
CLAIM 32
. The computer system (computer system) of claim 17 , wherein the at least one processor and memory are further operable to reduce by relating the data among the plurality of data groups .
EP0583559A1
CLAIM 31
A computer system (computer system) for recognizing and accessing a reference string of tokens in one or more original token strings within a database comprising : a database having a set of original token strings ;
a means for creating at least one original tuple for each of the original token strings in the database , the tuple formed by : a . partitioning each original token string into two or more contiguous original substrings of tokens ;
b . forming at least one original tuple associated with each original string by appending together at least two original non-contiguous substrings of the original string ;
a unique original index for each original tuple created from an original string , the original index being associated with the original string from which the original tuple was created ;
a first memory look-up structure with cells , the cells being accessed by the original index and containing information associated with the original string from which the original tuple was created ;
one or more reference tuples created from the reference string of tokens by : c . partitioning the reference string of tokens into two or more non contiguous reference substrings of tokens ;
d . forming at least one reference tuple by appending together at least two reference substrings ;
a unique reference index for each reference tuple created in a similar manner to which the original index was created , the reference index compared to at least one reference index to at least one original index ;
a second memory look-up structure for tracking matches between the reference index and original index , an original token string in the database being selected based on the number of matches between one or more original indexes and one or more reference indexes .

US8190610B2
CLAIM 33
. A map-reduce method of processing data from a plurality of groups having different schema over a computer system (computer system) , the method comprising : for a first data (similar manner) set having a plurality of first key-value pairs , wherein such first data set belongs to a first data group and the first key-value pairs have a first schema , partitioning the first data set into a plurality of data partitions and providing each data partition to a selected one of a plurality of mapping functions that are each user-configurable to independently output a plurality of lists of values for each of a set of keys found in such map function' ;

s corresponding data partition to form a first intermediate data set (hash table) having a first set of resulting key-value pairs ;

for a second data set having a plurality of second key-value pairs , wherein such second data set belongs to a second data group and the second key-value pairs have a second schema , partitioning the first data set into a plurality of data partitions and providing each data partition to a selected one of a plurality of mapping functions to form a second intermediate data set having a second set of resulting key-value pairs differing from the first set of resulting key-value pairs , wherein the first schema differs from the second schema , wherein the first and second schema and the first and second set of resulting key-value pairs have a key in common ;

and reducing the first and the second intermediate data set together so as to form an output data set , wherein the reducing is accomplished by iterating on at least one key from the first and second intermediate data set , and the output data set has a different schema than the first and second schema , wherein the mapping and reducing operations (different lengths) are performed by a distributed system .
EP0583559A1
CLAIM 3
A method of finding a reference string of tokens in one ore more original token strings within a database , as in claim 1 , where the original substrings of tokens are of different lengths (reducing operations) .

EP0583559A1
CLAIM 10
A method for recognizing and accessing a reference string of tokens in one or more original token strings within a database comprising the steps of : creating one or more original tuples for each of the original token strings in the database by : a . partitioning each original token string into two or more original substrings of contiguous tokens ;
b . appending together at least two original non-contiguous original substrings of the original string to form at least one original tuple associated with each original string ;
creating a unique original index for each original tuple created from an original string , the original index being associated with the original string from which the original tuple was created ;
using the original index to point to a cell in a first memory look-up structure and storing in the cell an information record associated with an original string ;
creating one or more reference tuples from the reference string of tokens by : c . partitioning the reference string of tokens into two or more reference substrings of contiguous tokens ;
d . appending together at least two non contiguous reference substrings to form at least one reference tuple ;
creating a unique reference index for each reference tuple in a similar manner (first data) to which the original index was created ;
comparing at least one reference index to at least one original index using the memory look-up structure ;
tracking the matches between the reference index and original index ;
storing the tracking results in a second memory look-up structure ;
selecting an original token string in the database based on the number of matches between one or more original indexes and one or more reference indexes .

EP0583559A1
CLAIM 20
A method for recognizing and accessing a reference string of tokens in one or more original token strings within a database , as in claim 10 , where the first look-up structure is a data structure that includes structures like a vector , array , and hash table (data set, value pairs, output data groups, output data set) .

EP0583559A1
CLAIM 31
A computer system (computer system) for recognizing and accessing a reference string of tokens in one or more original token strings within a database comprising : a database having a set of original token strings ;
a means for creating at least one original tuple for each of the original token strings in the database , the tuple formed by : a . partitioning each original token string into two or more contiguous original substrings of tokens ;
b . forming at least one original tuple associated with each original string by appending together at least two original non-contiguous substrings of the original string ;
a unique original index for each original tuple created from an original string , the original index being associated with the original string from which the original tuple was created ;
a first memory look-up structure with cells , the cells being accessed by the original index and containing information associated with the original string from which the original tuple was created ;
one or more reference tuples created from the reference string of tokens by : c . partitioning the reference string of tokens into two or more non contiguous reference substrings of tokens ;
d . forming at least one reference tuple by appending together at least two reference substrings ;
a unique reference index for each reference tuple created in a similar manner to which the original index was created , the reference index compared to at least one reference index to at least one original index ;
a second memory look-up structure for tracking matches between the reference index and original index , an original token string in the database being selected based on the number of matches between one or more original indexes and one or more reference indexes .

US8190610B2
CLAIM 34
. The map-reduce method of claim 33 , wherein the reducing is accomplished by iterating on at least one key from each of the first and second intermediate data set (hash table) so that the output data set is a merging of a portion of the first and second intermediate data set .
EP0583559A1
CLAIM 20
A method for recognizing and accessing a reference string of tokens in one or more original token strings within a database , as in claim 10 , where the first look-up structure is a data structure that includes structures like a vector , array , and hash table (data set, value pairs, output data groups, output data set) .

US8190610B2
CLAIM 35
. The map-reduce method of claim 33 , wherein the reducing is accomplished by a plurality of reducers , the method further comprises : partitioning the first or second intermediate data set (hash table) into a plurality of partitions , at least some of the key-value pairs of the first or second intermediate data being provided to more than one of the partitions ;

and providing the partitioned intermediate data set of each partition to a separate one of the reducers .
EP0583559A1
CLAIM 20
A method for recognizing and accessing a reference string of tokens in one or more original token strings within a database , as in claim 10 , where the first look-up structure is a data structure that includes structures like a vector , array , and hash table (data set, value pairs, output data groups, output data set) .

US8190610B2
CLAIM 36
. The map-reduce method of claim 35 , wherein all of the key-value pairs of the first and second intermediate data set (hash table) are provided to all of the reducers .
EP0583559A1
CLAIM 20
A method for recognizing and accessing a reference string of tokens in one or more original token strings within a database , as in claim 10 , where the first look-up structure is a data structure that includes structures like a vector , array , and hash table (data set, value pairs, output data groups, output data set) .

US8190610B2
CLAIM 40
. A computer system (computer system) including a plurality of computing devices , the computer system configured to process data of a data set (hash table) , wherein the data set comprises a plurality of data groups , the computer system comprises at least one processor and memory that are operable to perform the following operations : for a first data (similar manner) set having a plurality of first key-value pairs , wherein such first data set belongs to a first data group and the first key-value pairs have a first schema , partitioning the first data set into a plurality of data partitions and providing each data partition to a selected one of a plurality of mapping functions that are each user-configurable to independently output a plurality of lists of values for each of a set of keys found in such map function' ;

s corresponding data partition to form a first intermediate data set having a first set of resulting key-value pairs ;

for a second data set having a plurality of second key-value pairs , wherein such second data set belongs to a second data group and the second key-value pairs have a second schema , partitioning the first data set into a plurality of data partitions and providing each data partition to a selected one of a plurality of mapping functions to form a second intermediate data set having a second set of resulting key-value pairs differing from the first set of resulting key-value pairs , wherein the first schema differs from the second schema ;

and reducing the first and the second intermediate data set together so as to form an output data set , wherein the reducing is accomplished by iterating on at least one key from the first and second intermediate data set , and the output data set has a different schema than the first and second schema , wherein the first and second schema and the first and second set of resulting key-value pairs have a key in common .
EP0583559A1
CLAIM 10
A method for recognizing and accessing a reference string of tokens in one or more original token strings within a database comprising the steps of : creating one or more original tuples for each of the original token strings in the database by : a . partitioning each original token string into two or more original substrings of contiguous tokens ;
b . appending together at least two original non-contiguous original substrings of the original string to form at least one original tuple associated with each original string ;
creating a unique original index for each original tuple created from an original string , the original index being associated with the original string from which the original tuple was created ;
using the original index to point to a cell in a first memory look-up structure and storing in the cell an information record associated with an original string ;
creating one or more reference tuples from the reference string of tokens by : c . partitioning the reference string of tokens into two or more reference substrings of contiguous tokens ;
d . appending together at least two non contiguous reference substrings to form at least one reference tuple ;
creating a unique reference index for each reference tuple in a similar manner (first data) to which the original index was created ;
comparing at least one reference index to at least one original index using the memory look-up structure ;
tracking the matches between the reference index and original index ;
storing the tracking results in a second memory look-up structure ;
selecting an original token string in the database based on the number of matches between one or more original indexes and one or more reference indexes .

EP0583559A1
CLAIM 20
A method for recognizing and accessing a reference string of tokens in one or more original token strings within a database , as in claim 10 , where the first look-up structure is a data structure that includes structures like a vector , array , and hash table (data set, value pairs, output data groups, output data set) .

EP0583559A1
CLAIM 31
A computer system (computer system) for recognizing and accessing a reference string of tokens in one or more original token strings within a database comprising : a database having a set of original token strings ;
a means for creating at least one original tuple for each of the original token strings in the database , the tuple formed by : a . partitioning each original token string into two or more contiguous original substrings of tokens ;
b . forming at least one original tuple associated with each original string by appending together at least two original non-contiguous substrings of the original string ;
a unique original index for each original tuple created from an original string , the original index being associated with the original string from which the original tuple was created ;
a first memory look-up structure with cells , the cells being accessed by the original index and containing information associated with the original string from which the original tuple was created ;
one or more reference tuples created from the reference string of tokens by : c . partitioning the reference string of tokens into two or more non contiguous reference substrings of tokens ;
d . forming at least one reference tuple by appending together at least two reference substrings ;
a unique reference index for each reference tuple created in a similar manner to which the original index was created , the reference index compared to at least one reference index to at least one original index ;
a second memory look-up structure for tracking matches between the reference index and original index , an original token string in the database being selected based on the number of matches between one or more original indexes and one or more reference indexes .

US8190610B2
CLAIM 41
. The computer system (computer system) of claim 40 , wherein the reducing is accomplished by iterating on at least one key from each of the first and second intermediate data set (hash table) so that the output data set is a merging of a portion of the first and second intermediate data set .
EP0583559A1
CLAIM 20
A method for recognizing and accessing a reference string of tokens in one or more original token strings within a database , as in claim 10 , where the first look-up structure is a data structure that includes structures like a vector , array , and hash table (data set, value pairs, output data groups, output data set) .

EP0583559A1
CLAIM 31
A computer system (computer system) for recognizing and accessing a reference string of tokens in one or more original token strings within a database comprising : a database having a set of original token strings ;
a means for creating at least one original tuple for each of the original token strings in the database , the tuple formed by : a . partitioning each original token string into two or more contiguous original substrings of tokens ;
b . forming at least one original tuple associated with each original string by appending together at least two original non-contiguous substrings of the original string ;
a unique original index for each original tuple created from an original string , the original index being associated with the original string from which the original tuple was created ;
a first memory look-up structure with cells , the cells being accessed by the original index and containing information associated with the original string from which the original tuple was created ;
one or more reference tuples created from the reference string of tokens by : c . partitioning the reference string of tokens into two or more non contiguous reference substrings of tokens ;
d . forming at least one reference tuple by appending together at least two reference substrings ;
a unique reference index for each reference tuple created in a similar manner to which the original index was created , the reference index compared to at least one reference index to at least one original index ;
a second memory look-up structure for tracking matches between the reference index and original index , an original token string in the database being selected based on the number of matches between one or more original indexes and one or more reference indexes .

US8190610B2
CLAIM 42
. The computer system (computer system) of claim 40 , wherein the reducing is accomplished by a plurality of reducers , the d the at least one processor and memory further operable for : partitioning the first or second intermediate data set (hash table) into a plurality of partitions , at least some of the key-value pairs of the first or second intermediate data being provided to more than one of the partitions ;

and providing the partitioned intermediate data set of each partition to a separate one of the reducers .
EP0583559A1
CLAIM 20
A method for recognizing and accessing a reference string of tokens in one or more original token strings within a database , as in claim 10 , where the first look-up structure is a data structure that includes structures like a vector , array , and hash table (data set, value pairs, output data groups, output data set) .

EP0583559A1
CLAIM 31
A computer system (computer system) for recognizing and accessing a reference string of tokens in one or more original token strings within a database comprising : a database having a set of original token strings ;
a means for creating at least one original tuple for each of the original token strings in the database , the tuple formed by : a . partitioning each original token string into two or more contiguous original substrings of tokens ;
b . forming at least one original tuple associated with each original string by appending together at least two original non-contiguous substrings of the original string ;
a unique original index for each original tuple created from an original string , the original index being associated with the original string from which the original tuple was created ;
a first memory look-up structure with cells , the cells being accessed by the original index and containing information associated with the original string from which the original tuple was created ;
one or more reference tuples created from the reference string of tokens by : c . partitioning the reference string of tokens into two or more non contiguous reference substrings of tokens ;
d . forming at least one reference tuple by appending together at least two reference substrings ;
a unique reference index for each reference tuple created in a similar manner to which the original index was created , the reference index compared to at least one reference index to at least one original index ;
a second memory look-up structure for tracking matches between the reference index and original index , an original token string in the database being selected based on the number of matches between one or more original indexes and one or more reference indexes .

US8190610B2
CLAIM 43
. The computer system (computer system) of claim 42 , wherein all of the key-value pairs of the first and second intermediate data set (hash table) are provided to all of the reducers .
EP0583559A1
CLAIM 20
A method for recognizing and accessing a reference string of tokens in one or more original token strings within a database , as in claim 10 , where the first look-up structure is a data structure that includes structures like a vector , array , and hash table (data set, value pairs, output data groups, output data set) .

EP0583559A1
CLAIM 31
A computer system (computer system) for recognizing and accessing a reference string of tokens in one or more original token strings within a database comprising : a database having a set of original token strings ;
a means for creating at least one original tuple for each of the original token strings in the database , the tuple formed by : a . partitioning each original token string into two or more contiguous original substrings of tokens ;
b . forming at least one original tuple associated with each original string by appending together at least two original non-contiguous substrings of the original string ;
a unique original index for each original tuple created from an original string , the original index being associated with the original string from which the original tuple was created ;
a first memory look-up structure with cells , the cells being accessed by the original index and containing information associated with the original string from which the original tuple was created ;
one or more reference tuples created from the reference string of tokens by : c . partitioning the reference string of tokens into two or more non contiguous reference substrings of tokens ;
d . forming at least one reference tuple by appending together at least two reference substrings ;
a unique reference index for each reference tuple created in a similar manner to which the original index was created , the reference index compared to at least one reference index to at least one original index ;
a second memory look-up structure for tracking matches between the reference index and original index , an original token string in the database being selected based on the number of matches between one or more original indexes and one or more reference indexes .

US8190610B2
CLAIM 44
. The computer system (computer system) of claim 42 , wherein at least some of the reducers include a sort , group-by-key and combine task , the d the at least one processor and memory are further operable for : generating and providing metadata for at least some of the mapping , partitioning , combining , grouping and sorting .
EP0583559A1
CLAIM 31
A computer system (computer system) for recognizing and accessing a reference string of tokens in one or more original token strings within a database comprising : a database having a set of original token strings ;
a means for creating at least one original tuple for each of the original token strings in the database , the tuple formed by : a . partitioning each original token string into two or more contiguous original substrings of tokens ;
b . forming at least one original tuple associated with each original string by appending together at least two original non-contiguous substrings of the original string ;
a unique original index for each original tuple created from an original string , the original index being associated with the original string from which the original tuple was created ;
a first memory look-up structure with cells , the cells being accessed by the original index and containing information associated with the original string from which the original tuple was created ;
one or more reference tuples created from the reference string of tokens by : c . partitioning the reference string of tokens into two or more non contiguous reference substrings of tokens ;
d . forming at least one reference tuple by appending together at least two reference substrings ;
a unique reference index for each reference tuple created in a similar manner to which the original index was created , the reference index compared to at least one reference index to at least one original index ;
a second memory look-up structure for tracking matches between the reference index and original index , an original token string in the database being selected based on the number of matches between one or more original indexes and one or more reference indexes .

US8190610B2
CLAIM 45
. The computer system (computer system) of claim 44 , wherein the reducing includes processing the metadata .
EP0583559A1
CLAIM 31
A computer system (computer system) for recognizing and accessing a reference string of tokens in one or more original token strings within a database comprising : a database having a set of original token strings ;
a means for creating at least one original tuple for each of the original token strings in the database , the tuple formed by : a . partitioning each original token string into two or more contiguous original substrings of tokens ;
b . forming at least one original tuple associated with each original string by appending together at least two original non-contiguous substrings of the original string ;
a unique original index for each original tuple created from an original string , the original index being associated with the original string from which the original tuple was created ;
a first memory look-up structure with cells , the cells being accessed by the original index and containing information associated with the original string from which the original tuple was created ;
one or more reference tuples created from the reference string of tokens by : c . partitioning the reference string of tokens into two or more non contiguous reference substrings of tokens ;
d . forming at least one reference tuple by appending together at least two reference substrings ;
a unique reference index for each reference tuple created in a similar manner to which the original index was created , the reference index compared to at least one reference index to at least one original index ;
a second memory look-up structure for tracking matches between the reference index and original index , an original token string in the database being selected based on the number of matches between one or more original indexes and one or more reference indexes .

US8190610B2
CLAIM 46
. The computer system (computer system) of claim 45 , wherein iterating includes providing the associated metadata to the processing of the reducing step .
EP0583559A1
CLAIM 31
A computer system (computer system) for recognizing and accessing a reference string of tokens in one or more original token strings within a database comprising : a database having a set of original token strings ;
a means for creating at least one original tuple for each of the original token strings in the database , the tuple formed by : a . partitioning each original token string into two or more contiguous original substrings of tokens ;
b . forming at least one original tuple associated with each original string by appending together at least two original non-contiguous substrings of the original string ;
a unique original index for each original tuple created from an original string , the original index being associated with the original string from which the original tuple was created ;
a first memory look-up structure with cells , the cells being accessed by the original index and containing information associated with the original string from which the original tuple was created ;
one or more reference tuples created from the reference string of tokens by : c . partitioning the reference string of tokens into two or more non contiguous reference substrings of tokens ;
d . forming at least one reference tuple by appending together at least two reference substrings ;
a unique reference index for each reference tuple created in a similar manner to which the original index was created , the reference index compared to at least one reference index to at least one original index ;
a second memory look-up structure for tracking matches between the reference index and original index , an original token string in the database being selected based on the number of matches between one or more original indexes and one or more reference indexes .




US8190610B2

Filed: 2006-10-05     Issued: 2012-05-29

MapReduce for distributed database processing

(Original Assignee) Yahoo Inc     (Current Assignee) R2 Solutions LLC ; Altaba Inc

Ali Dasdan, Hung-Chih Yang, Ruey-Lung Hsiao
CA2600344A1

Filed: 2006-03-02     Issued: 2006-09-08

Distribution of trust data

(Original Assignee) Markmonitor Inc.; Mark Shull; William Bohlman; Ihab Shraim; Christopher J. Bura; Markmonitor, Inc.     (Current Assignee) MarkMonitor Inc

Mark Shull, William Bohlman, Ihab Shraim, Christopher J. Bura
US8190610B2
CLAIM 1
. A method of processing data (cache server) of a data set over a distributed system , wherein the data set comprises a plurality of data groups , the method comprising : partitioning the data of each one of the data groups into a plurality of data partitions that each have a plurality of key-value pairs and providing each data partition to a selected one of a plurality of mapping functions that are each user-configurable to independently output a plurality of lists of values for each of a set of keys found in such map function' ;

s corresponding data partition to form corresponding intermediate data for that data group and identifiable to that data group , wherein the data of a first data (communicatively couple, first data) group has a different schema than the data of a second data (second data) group and the data of the first data group is mapped differently than the data of the second data group so that different lists of values are output for the corresponding different intermediate data , wherein the different schema and corresponding different intermediate data have a key in common ;

and reducing the intermediate data for the data groups to at least one output data group , including processing the intermediate data for each data group in a manner that is defined to correspond to that data group , so as to result in a merging of the corresponding different intermediate data based on the key in common , wherein the mapping and reducing operations are performed by a distributed system .
CA2600344A1
CLAIM 18
. A method of distributing trust scores from a trust evaluation system , the method comprising : determining , at the trust evaluation system , a trust score for each of a plurality of online entities ;
populating , with the trust evaluation system , a trust database with the trust scores ;
and transmitting , from the trust evaluation system , at least a portion of the data included in the trust database to a cache server (processing data) .

CA2600344A1
CLAIM 29
. A trust authentication system comprising : a client application configured to communicate with online entities ;
and a monitoring agent communicatively couple (first data, first data set, output data set, includes data) d with the client application and configured to obtain trust scores for the online entities .

CA2600344A1
CLAIM 37
. The system of claim 36 , wherein : the plurality of databases comprises a first data (first data, first data set, output data set, includes data) base having a first subset of a set of trust scores and a second data (second data) base having a second subset of the set of trust scores ;
the plurality of trust servers comprises a first trust server in communication with the first database and a second trust server in communiation with the second database ;
the first trust server is designated an authoritative server with respect to the first subset of the set of trust scores ;
and the second trust server is designated an authoritative server with respect to the second subset of the set of trust scores .

US8190610B2
CLAIM 13
. The method of claim 1 , wherein : the intermediate data processing step of the reducing step further comprises processing data (cache server) that is not intermediate data .
CA2600344A1
CLAIM 18
. A method of distributing trust scores from a trust evaluation system , the method comprising : determining , at the trust evaluation system , a trust score for each of a plurality of online entities ;
populating , with the trust evaluation system , a trust database with the trust scores ;
and transmitting , from the trust evaluation system , at least a portion of the data included in the trust database to a cache server (processing data) .

US8190610B2
CLAIM 14
. The method of claim 13 , wherein : the reducing step is carried out by a plurality of reducers ;

and the data that is not intermediate data , for a particular reducer , includes data (communicatively couple, first data) that is associated with another reducer .
CA2600344A1
CLAIM 29
. A trust authentication system comprising : a client application configured to communicate with online entities ;
and a monitoring agent communicatively couple (first data, first data set, output data set, includes data) d with the client application and configured to obtain trust scores for the online entities .

CA2600344A1
CLAIM 37
. The system of claim 36 , wherein : the plurality of databases comprises a first data (first data, first data set, output data set, includes data) base having a first subset of a set of trust scores and a second database having a second subset of the set of trust scores ;
the plurality of trust servers comprises a first trust server in communication with the first database and a second trust server in communiation with the second database ;
the first trust server is designated an authoritative server with respect to the first subset of the set of trust scores ;
and the second trust server is designated an authoritative server with respect to the second subset of the set of trust scores .

US8190610B2
CLAIM 15
. The method of claim 13 , wherein : the reducing step is carried out by a plurality of reducers ;

and the data that is not intermediate data , for a particular reducer , includes data (communicatively couple, first data) that is associated with that reducer .
CA2600344A1
CLAIM 29
. A trust authentication system comprising : a client application configured to communicate with online entities ;
and a monitoring agent communicatively couple (first data, first data set, output data set, includes data) d with the client application and configured to obtain trust scores for the online entities .

CA2600344A1
CLAIM 37
. The system of claim 36 , wherein : the plurality of databases comprises a first data (first data, first data set, output data set, includes data) base having a first subset of a set of trust scores and a second database having a second subset of the set of trust scores ;
the plurality of trust servers comprises a first trust server in communication with the first database and a second trust server in communiation with the second database ;
the first trust server is designated an authoritative server with respect to the first subset of the set of trust scores ;
and the second trust server is designated an authoritative server with respect to the second subset of the set of trust scores .

US8190610B2
CLAIM 17
. A computer system including a plurality of computing devices , the computer system configured to process data of a data set , wherein the data set comprises a plurality of data groups , the computer system comprises at least one processor and memory that are operable to perform the following operations : partitioning the data of each one of the data groups into a plurality of data partitions that each have a plurality of key-value pairs and providing each data partition to a selected one of a plurality of mapping functions that are each user-configurable to independently output a plurality of lists of values for each of a set of keys found in such map function' ;

s corresponding data partition to form corresponding intermediate data for that data group and identifiable to that data group , wherein the data of a first data (communicatively couple, first data) group has a different schema than the data of a second data (second data) group and the data of the first data group is mapped differently than the data of the second data group so that different lists of values are output for the corresponding different intermediate data , wherein the different schema and corresponding different intermediate data have a key in common ;

and reduce the intermediate data for the data groups to at least one output data group , including processing the intermediate data for each data group in a manner that is defined to correspond to that data group so as to result in a merging of the corresponding different intermediate data based on the key in common .
CA2600344A1
CLAIM 29
. A trust authentication system comprising : a client application configured to communicate with online entities ;
and a monitoring agent communicatively couple (first data, first data set, output data set, includes data) d with the client application and configured to obtain trust scores for the online entities .

CA2600344A1
CLAIM 37
. The system of claim 36 , wherein : the plurality of databases comprises a first data (first data, first data set, output data set, includes data) base having a first subset of a set of trust scores and a second data (second data) base having a second subset of the set of trust scores ;
the plurality of trust servers comprises a first trust server in communication with the first database and a second trust server in communiation with the second database ;
the first trust server is designated an authoritative server with respect to the first subset of the set of trust scores ;
and the second trust server is designated an authoritative server with respect to the second subset of the set of trust scores .

US8190610B2
CLAIM 29
. The computer system of claim 17 , wherein : the intermediate data processing of the reducing further comprises processing data (cache server) that is not intermediate data .
CA2600344A1
CLAIM 18
. A method of distributing trust scores from a trust evaluation system , the method comprising : determining , at the trust evaluation system , a trust score for each of a plurality of online entities ;
populating , with the trust evaluation system , a trust database with the trust scores ;
and transmitting , from the trust evaluation system , at least a portion of the data included in the trust database to a cache server (processing data) .

US8190610B2
CLAIM 30
. The computer system of claim 29 , wherein : the at least one processor and memory are further operable to reduce the intermediate data via a plurality of reducers ;

and the data that is not intermediate data , for a particular reducer , includes data (communicatively couple, first data) that is associated with another reducer .
CA2600344A1
CLAIM 29
. A trust authentication system comprising : a client application configured to communicate with online entities ;
and a monitoring agent communicatively couple (first data, first data set, output data set, includes data) d with the client application and configured to obtain trust scores for the online entities .

CA2600344A1
CLAIM 37
. The system of claim 36 , wherein : the plurality of databases comprises a first data (first data, first data set, output data set, includes data) base having a first subset of a set of trust scores and a second database having a second subset of the set of trust scores ;
the plurality of trust servers comprises a first trust server in communication with the first database and a second trust server in communiation with the second database ;
the first trust server is designated an authoritative server with respect to the first subset of the set of trust scores ;
and the second trust server is designated an authoritative server with respect to the second subset of the set of trust scores .

US8190610B2
CLAIM 31
. The computer system of claim 29 , wherein : the at least one processor and memory are further operable to reduce the intermediate data via a plurality of reducers ;

and the data that is not intermediate data , for a particular reducer , includes data (communicatively couple, first data) that is associated with that reducer .
CA2600344A1
CLAIM 29
. A trust authentication system comprising : a client application configured to communicate with online entities ;
and a monitoring agent communicatively couple (first data, first data set, output data set, includes data) d with the client application and configured to obtain trust scores for the online entities .

CA2600344A1
CLAIM 37
. The system of claim 36 , wherein : the plurality of databases comprises a first data (first data, first data set, output data set, includes data) base having a first subset of a set of trust scores and a second database having a second subset of the set of trust scores ;
the plurality of trust servers comprises a first trust server in communication with the first database and a second trust server in communiation with the second database ;
the first trust server is designated an authoritative server with respect to the first subset of the set of trust scores ;
and the second trust server is designated an authoritative server with respect to the second subset of the set of trust scores .

US8190610B2
CLAIM 33
. A map-reduce method of processing data (cache server) from a plurality of groups having different schema over a computer system , the method comprising : for a first data (communicatively couple, first data) set having a plurality of first key-value pairs , wherein such first data set belongs to a first data group and the first key-value pairs have a first schema , partitioning the first data set into a plurality of data partitions and providing each data partition to a selected one of a plurality of mapping functions that are each user-configurable to independently output a plurality of lists of values for each of a set of keys found in such map function' ;

s corresponding data partition to form a first intermediate data set having a first set (first set) of resulting key-value pairs ;

for a second data (second data) set having a plurality of second key-value pairs , wherein such second data set belongs to a second data group and the second key-value pairs have a second schema , partitioning the first data set into a plurality of data partitions and providing each data partition to a selected one of a plurality of mapping functions to form a second intermediate data set having a second set (second set) of resulting key-value pairs differing from the first set of resulting key-value pairs , wherein the first schema differs from the second schema , wherein the first and second schema and the first and second set of resulting key-value pairs have a key in common ;

and reducing the first and the second intermediate data set together so as to form an output data set (communicatively couple, first data) , wherein the reducing is accomplished by iterating on at least one key from the first and second intermediate data set , and the output data set has a different schema than the first and second schema , wherein the mapping and reducing operations are performed by a distributed system .
CA2600344A1
CLAIM 18
. A method of distributing trust scores from a trust evaluation system , the method comprising : determining , at the trust evaluation system , a trust score for each of a plurality of online entities ;
populating , with the trust evaluation system , a trust database with the trust scores ;
and transmitting , from the trust evaluation system , at least a portion of the data included in the trust database to a cache server (processing data) .

CA2600344A1
CLAIM 21
. A method of distributing trust scores from a trust evaluation system evaluating online entities , the method comprising : retrieving a first plurality of trust scores from a trust data store , the first plurality of trust scores associated with a first set (first set) of online entities , each of the first plurality of trust scores evaluating an online entity included in the first set ;
retrieving a second plurality of trust scores from the trust data store , the second plurality of trust scores associated with a second set (second set) of online entities , each of the second plurality of trust scores evaluating an online entity included in the second set ;
transmitting , from the trust evaluation system , the first plurality of trust scores to a first trust score server ;
and transmitting , from the trust evaluation system , the second plurality of trust scores to a second trust score server .

CA2600344A1
CLAIM 29
. A trust authentication system comprising : a client application configured to communicate with online entities ;
and a monitoring agent communicatively couple (first data, first data set, output data set, includes data) d with the client application and configured to obtain trust scores for the online entities .

CA2600344A1
CLAIM 37
. The system of claim 36 , wherein : the plurality of databases comprises a first data (first data, first data set, output data set, includes data) base having a first subset of a set of trust scores and a second data (second data) base having a second subset of the set of trust scores ;
the plurality of trust servers comprises a first trust server in communication with the first database and a second trust server in communiation with the second database ;
the first trust server is designated an authoritative server with respect to the first subset of the set of trust scores ;
and the second trust server is designated an authoritative server with respect to the second subset of the set of trust scores .

US8190610B2
CLAIM 34
. The map-reduce method of claim 33 , wherein the reducing is accomplished by iterating on at least one key from each of the first and second intermediate data set so that the output data set (communicatively couple, first data) is a merging of a portion of the first and second intermediate data set .
CA2600344A1
CLAIM 29
. A trust authentication system comprising : a client application configured to communicate with online entities ;
and a monitoring agent communicatively couple (first data, first data set, output data set, includes data) d with the client application and configured to obtain trust scores for the online entities .

CA2600344A1
CLAIM 37
. The system of claim 36 , wherein : the plurality of databases comprises a first data (first data, first data set, output data set, includes data) base having a first subset of a set of trust scores and a second database having a second subset of the set of trust scores ;
the plurality of trust servers comprises a first trust server in communication with the first database and a second trust server in communiation with the second database ;
the first trust server is designated an authoritative server with respect to the first subset of the set of trust scores ;
and the second trust server is designated an authoritative server with respect to the second subset of the set of trust scores .

US8190610B2
CLAIM 40
. A computer system including a plurality of computing devices , the computer system configured to process data of a data set , wherein the data set comprises a plurality of data groups , the computer system comprises at least one processor and memory that are operable to perform the following operations : for a first data (communicatively couple, first data) set having a plurality of first key-value pairs , wherein such first data set belongs to a first data group and the first key-value pairs have a first schema , partitioning the first data set into a plurality of data partitions and providing each data partition to a selected one of a plurality of mapping functions that are each user-configurable to independently output a plurality of lists of values for each of a set of keys found in such map function' ;

s corresponding data partition to form a first intermediate data set having a first set (first set) of resulting key-value pairs ;

for a second data (second data) set having a plurality of second key-value pairs , wherein such second data set belongs to a second data group and the second key-value pairs have a second schema , partitioning the first data set into a plurality of data partitions and providing each data partition to a selected one of a plurality of mapping functions to form a second intermediate data set having a second set (second set) of resulting key-value pairs differing from the first set of resulting key-value pairs , wherein the first schema differs from the second schema ;

and reducing the first and the second intermediate data set together so as to form an output data set (communicatively couple, first data) , wherein the reducing is accomplished by iterating on at least one key from the first and second intermediate data set , and the output data set has a different schema than the first and second schema , wherein the first and second schema and the first and second set of resulting key-value pairs have a key in common .
CA2600344A1
CLAIM 21
. A method of distributing trust scores from a trust evaluation system evaluating online entities , the method comprising : retrieving a first plurality of trust scores from a trust data store , the first plurality of trust scores associated with a first set (first set) of online entities , each of the first plurality of trust scores evaluating an online entity included in the first set ;
retrieving a second plurality of trust scores from the trust data store , the second plurality of trust scores associated with a second set (second set) of online entities , each of the second plurality of trust scores evaluating an online entity included in the second set ;
transmitting , from the trust evaluation system , the first plurality of trust scores to a first trust score server ;
and transmitting , from the trust evaluation system , the second plurality of trust scores to a second trust score server .

CA2600344A1
CLAIM 29
. A trust authentication system comprising : a client application configured to communicate with online entities ;
and a monitoring agent communicatively couple (first data, first data set, output data set, includes data) d with the client application and configured to obtain trust scores for the online entities .

CA2600344A1
CLAIM 37
. The system of claim 36 , wherein : the plurality of databases comprises a first data (first data, first data set, output data set, includes data) base having a first subset of a set of trust scores and a second data (second data) base having a second subset of the set of trust scores ;
the plurality of trust servers comprises a first trust server in communication with the first database and a second trust server in communiation with the second database ;
the first trust server is designated an authoritative server with respect to the first subset of the set of trust scores ;
and the second trust server is designated an authoritative server with respect to the second subset of the set of trust scores .

US8190610B2
CLAIM 41
. The computer system of claim 40 , wherein the reducing is accomplished by iterating on at least one key from each of the first and second intermediate data set so that the output data set (communicatively couple, first data) is a merging of a portion of the first and second intermediate data set .
CA2600344A1
CLAIM 29
. A trust authentication system comprising : a client application configured to communicate with online entities ;
and a monitoring agent communicatively couple (first data, first data set, output data set, includes data) d with the client application and configured to obtain trust scores for the online entities .

CA2600344A1
CLAIM 37
. The system of claim 36 , wherein : the plurality of databases comprises a first data (first data, first data set, output data set, includes data) base having a first subset of a set of trust scores and a second database having a second subset of the set of trust scores ;
the plurality of trust servers comprises a first trust server in communication with the first database and a second trust server in communiation with the second database ;
the first trust server is designated an authoritative server with respect to the first subset of the set of trust scores ;
and the second trust server is designated an authoritative server with respect to the second subset of the set of trust scores .




US8190610B2

Filed: 2006-10-05     Issued: 2012-05-29

MapReduce for distributed database processing

(Original Assignee) Yahoo Inc     (Current Assignee) R2 Solutions LLC ; Altaba Inc

Ali Dasdan, Hung-Chih Yang, Ruey-Lung Hsiao
JP2006155663A

Filed: 2006-03-01     Issued: 2006-06-15

最大ビットスライスを用いてビットストリングにブール演算を施すための方法とシステム

(Original Assignee) Sand Technology Systems Internatl Inc; サンド テクノロジー システムズ インターナショナル,インコーポレイティド     

Jean A Marquis, Michael W Mccool, エー. マークイス,ジャン, ダブリュ. マックール,マイケル
US8190610B2
CLAIM 1
. A method of processing data (コンピュ) of a data set over a distributed system , wherein the data set comprises a plurality of data groups , the method comprising : partitioning the data of each one of the data groups into a plurality of data partitions that each have a plurality of key-value pairs and providing each data partition to a selected one of a plurality of mapping functions that are each user-configurable to independently output a plurality of lists of values for each of a set of keys found in such map function' ;

s corresponding data partition to form corresponding intermediate data for that data group and identifiable to that data group , wherein the data of a first data group has a different schema (メモリ) than the data of a second data group and the data of the first data group is mapped differently than the data of the second data group so that different lists of values are output for the corresponding different intermediate data , wherein the different schema and corresponding different intermediate data have a key in common ;

and reducing the intermediate data for the data groups to at least one output data group , including processing the intermediate data for each data group in a manner that is defined to correspond to that data group , so as to result in a merging of the corresponding different intermediate data based on the key in common , wherein the mapping and reducing operations are performed by a distributed system .
JP2006155663A
CLAIM 1
関係データベース管理システムを動かすためのコンピュ (processing data) ータを使用してビットストリングにブール演算を施して結果のビットストリングを形成するためのステップを具備する方法であって、ビットストリングは関係データベース管理システムにおける組織化されたデータを表わし、結果のビットストリングはユーザによるクエリーの条件を満足する関係データベース管理システムにおけるレコードを表わし、各ビットストリングは入力ビットスライスに分割され、結果のビットストリングは結果のビットスライスに分割され、コンピュータを使用するステップはさらに、関係データベース管理システムにおける組織化されたデータを表わす第1のビットストリングからの第1の入力ビットスライスと関係データベース管理システムにおける組織化されたデータを表わす第2のビットストリングからの第2の入力ビットスライスとに基づきブール演算による作用を決定し、 第1の入力ビットスライスと第2の入力ビットスライスとの中からより長いビット長の入力ビットスライスを選択し、 より長いビット長を有するビットストリングのビット数まで、より長い入力ビットスライスと、より短かいビット長の入力ビットスライスを有するビットストリングの複数の入力ビットスライスとを、決定された作用に従って処理してユーザのクエリーの条件を満足する結果のビットストリングについて少なくとも1つの結果のビットスライスを形成する各ステップを具備する方法。

JP2006155663A
CLAIM 19
複数のビットストリングを格納するメモリ (different schema) であってその各々は関係データベース管理システムにおける組織化されたデータを表わし各々が入力ビットスライスに分割されるものを具備する関係データベース管理システムと、 ビットストリングの1つにブール演算を施して結果のビットストリングを形成する手段であって、該結果のビットストリングはユーザのクエリーの条件を満足する関係データベース管理システムにおけるレコードを表わし、結果のビットストリングは結果のビットスライスに分割されるものと、 関係データベース管理システムにおける組織化されたデータを表わす第1のビットストリングからの第1の入力ビットスライスと関係データベース管理システムにおける組織化されたデータを表わす第2のビットストリングからの第2の入力ビットスライスとに基づきブール演算による作用を決定する手段と、 第1の入力ビットスライスと第2の入力ビットスライスとの中からより長いビット長の入力ビットスライスを選択する手段と、 より長いビット長を有するビットストリングのビット数まで、より長い入力ビットスライスと、より短かいビット長の入力ビットスライスを有するビットストリングの複数の入力ビットスライスとを、決定された作用に従って処理してユーザのクエリーの条件を満足する結果のビットストリングについて少なくとも1つの結果のビットスライスを形成する手段とを具備するシステム。

US8190610B2
CLAIM 8
. The method of claim 7 , wherein : in the partitioning step (施すステップ) , all of the key/value pairs of the intermediate data are provided to all of the partitions .
JP2006155663A
CLAIM 17
第1のビットストリングからの第1の入力ビットスライスと第2のビットストリングからの第2の入力ビットスライスとに基づきフリップ関数が必要であるかを決定し、 フリップ関数が必要とされるとき、処理するステップの中で第1の入力ビットスライスまたは第2の入力ビットスライスへフリップ関数を施すステップ (partitioning step) をさらに具備する請求項1記載の方法。

US8190610B2
CLAIM 12
. The method of claim 5 , wherein : the intermediate data includes a plurality of grouped sets of key/value pairs ;

the reducing step is carried out by a plurality of reducers ;

for at least one of the reducers , the iterator corresponding to a particular data group , for that reducer , operates according to a different key of a different schema (メモリ) than the iterator corresponding to another particular data group , for that reducer .
JP2006155663A
CLAIM 19
複数のビットストリングを格納するメモリ (different schema) であってその各々は関係データベース管理システムにおける組織化されたデータを表わし各々が入力ビットスライスに分割されるものを具備する関係データベース管理システムと、 ビットストリングの1つにブール演算を施して結果のビットストリングを形成する手段であって、該結果のビットストリングはユーザのクエリーの条件を満足する関係データベース管理システムにおけるレコードを表わし、結果のビットストリングは結果のビットスライスに分割されるものと、 関係データベース管理システムにおける組織化されたデータを表わす第1のビットストリングからの第1の入力ビットスライスと関係データベース管理システムにおける組織化されたデータを表わす第2のビットストリングからの第2の入力ビットスライスとに基づきブール演算による作用を決定する手段と、 第1の入力ビットスライスと第2の入力ビットスライスとの中からより長いビット長の入力ビットスライスを選択する手段と、 より長いビット長を有するビットストリングのビット数まで、より長い入力ビットスライスと、より短かいビット長の入力ビットスライスを有するビットストリングの複数の入力ビットスライスとを、決定された作用に従って処理してユーザのクエリーの条件を満足する結果のビットストリングについて少なくとも1つの結果のビットスライスを形成する手段とを具備するシステム。

US8190610B2
CLAIM 13
. The method of claim 1 , wherein : the intermediate data processing step of the reducing step further comprises processing data (コンピュ) that is not intermediate data .
JP2006155663A
CLAIM 1
関係データベース管理システムを動かすためのコンピュ (processing data) ータを使用してビットストリングにブール演算を施して結果のビットストリングを形成するためのステップを具備する方法であって、ビットストリングは関係データベース管理システムにおける組織化されたデータを表わし、結果のビットストリングはユーザによるクエリーの条件を満足する関係データベース管理システムにおけるレコードを表わし、各ビットストリングは入力ビットスライスに分割され、結果のビットストリングは結果のビットスライスに分割され、コンピュータを使用するステップはさらに、関係データベース管理システムにおける組織化されたデータを表わす第1のビットストリングからの第1の入力ビットスライスと関係データベース管理システムにおける組織化されたデータを表わす第2のビットストリングからの第2の入力ビットスライスとに基づきブール演算による作用を決定し、 第1の入力ビットスライスと第2の入力ビットスライスとの中からより長いビット長の入力ビットスライスを選択し、 より長いビット長を有するビットストリングのビット数まで、より長い入力ビットスライスと、より短かいビット長の入力ビットスライスを有するビットストリングの複数の入力ビットスライスとを、決定された作用に従って処理してユーザのクエリーの条件を満足する結果のビットストリングについて少なくとも1つの結果のビットスライスを形成する各ステップを具備する方法。

US8190610B2
CLAIM 17
. A computer system including a plurality of computing devices , the computer system configured to process data of a data set , wherein the data set comprises a plurality of data groups , the computer system comprises at least one processor and memory that are operable to perform the following operations : partitioning the data of each one of the data groups into a plurality of data partitions that each have a plurality of key-value pairs and providing each data partition to a selected one of a plurality of mapping functions that are each user-configurable to independently output a plurality of lists of values for each of a set of keys found in such map function' ;

s corresponding data partition to form corresponding intermediate data for that data group and identifiable to that data group , wherein the data of a first data group has a different schema (メモリ) than the data of a second data group and the data of the first data group is mapped differently than the data of the second data group so that different lists of values are output for the corresponding different intermediate data , wherein the different schema and corresponding different intermediate data have a key in common ;

and reduce the intermediate data for the data groups to at least one output data group , including processing the intermediate data for each data group in a manner that is defined to correspond to that data group so as to result in a merging of the corresponding different intermediate data based on the key in common .
JP2006155663A
CLAIM 19
複数のビットストリングを格納するメモリ (different schema) であってその各々は関係データベース管理システムにおける組織化されたデータを表わし各々が入力ビットスライスに分割されるものを具備する関係データベース管理システムと、 ビットストリングの1つにブール演算を施して結果のビットストリングを形成する手段であって、該結果のビットストリングはユーザのクエリーの条件を満足する関係データベース管理システムにおけるレコードを表わし、結果のビットストリングは結果のビットスライスに分割されるものと、 関係データベース管理システムにおける組織化されたデータを表わす第1のビットストリングからの第1の入力ビットスライスと関係データベース管理システムにおける組織化されたデータを表わす第2のビットストリングからの第2の入力ビットスライスとに基づきブール演算による作用を決定する手段と、 第1の入力ビットスライスと第2の入力ビットスライスとの中からより長いビット長の入力ビットスライスを選択する手段と、 より長いビット長を有するビットストリングのビット数まで、より長い入力ビットスライスと、より短かいビット長の入力ビットスライスを有するビットストリングの複数の入力ビットスライスとを、決定された作用に従って処理してユーザのクエリーの条件を満足する結果のビットストリングについて少なくとも1つの結果のビットスライスを形成する手段とを具備するシステム。

US8190610B2
CLAIM 22
. The computer system of claim 21 , wherein : the intermediate data includes a plurality of grouped sets of key/value pairs ;

the at least one processor and memory are further operable to reduce the intermediate data via a plurality of reducers ;

for at least one of the reducers , the iterator corresponding to a particular data group , for that reducer , is configured to operate according to a different key of a different schema (メモリ) than the iterator corresponding to another particular data group , for that reducer .
JP2006155663A
CLAIM 19
複数のビットストリングを格納するメモリ (different schema) であってその各々は関係データベース管理システムにおける組織化されたデータを表わし各々が入力ビットスライスに分割されるものを具備する関係データベース管理システムと、 ビットストリングの1つにブール演算を施して結果のビットストリングを形成する手段であって、該結果のビットストリングはユーザのクエリーの条件を満足する関係データベース管理システムにおけるレコードを表わし、結果のビットストリングは結果のビットスライスに分割されるものと、 関係データベース管理システムにおける組織化されたデータを表わす第1のビットストリングからの第1の入力ビットスライスと関係データベース管理システムにおける組織化されたデータを表わす第2のビットストリングからの第2の入力ビットスライスとに基づきブール演算による作用を決定する手段と、 第1の入力ビットスライスと第2の入力ビットスライスとの中からより長いビット長の入力ビットスライスを選択する手段と、 より長いビット長を有するビットストリングのビット数まで、より長い入力ビットスライスと、より短かいビット長の入力ビットスライスを有するビットストリングの複数の入力ビットスライスとを、決定された作用に従って処理してユーザのクエリーの条件を満足する結果のビットストリングについて少なくとも1つの結果のビットスライスを形成する手段とを具備するシステム。

US8190610B2
CLAIM 29
. The computer system of claim 17 , wherein : the intermediate data processing of the reducing further comprises processing data (コンピュ) that is not intermediate data .
JP2006155663A
CLAIM 1
関係データベース管理システムを動かすためのコンピュ (processing data) ータを使用してビットストリングにブール演算を施して結果のビットストリングを形成するためのステップを具備する方法であって、ビットストリングは関係データベース管理システムにおける組織化されたデータを表わし、結果のビットストリングはユーザによるクエリーの条件を満足する関係データベース管理システムにおけるレコードを表わし、各ビットストリングは入力ビットスライスに分割され、結果のビットストリングは結果のビットスライスに分割され、コンピュータを使用するステップはさらに、関係データベース管理システムにおける組織化されたデータを表わす第1のビットストリングからの第1の入力ビットスライスと関係データベース管理システムにおける組織化されたデータを表わす第2のビットストリングからの第2の入力ビットスライスとに基づきブール演算による作用を決定し、 第1の入力ビットスライスと第2の入力ビットスライスとの中からより長いビット長の入力ビットスライスを選択し、 より長いビット長を有するビットストリングのビット数まで、より長い入力ビットスライスと、より短かいビット長の入力ビットスライスを有するビットストリングの複数の入力ビットスライスとを、決定された作用に従って処理してユーザのクエリーの条件を満足する結果のビットストリングについて少なくとも1つの結果のビットスライスを形成する各ステップを具備する方法。

US8190610B2
CLAIM 33
. A map-reduce method of processing data (コンピュ) from a plurality of groups having different schema (メモリ) over a computer system , the method comprising : for a first data set having a plurality of first key-value pairs , wherein such first data set belongs to a first data group and the first key-value pairs have a first schema , partitioning the first data set into a plurality of data partitions and providing each data partition to a selected one of a plurality of mapping functions that are each user-configurable to independently output a plurality of lists of values for each of a set of keys found in such map function' ;

s corresponding data partition to form a first intermediate data set having a first set of resulting key-value pairs ;

for a second data set having a plurality of second key-value pairs , wherein such second data set belongs to a second data group and the second key-value pairs have a second schema , partitioning the first data set into a plurality of data partitions and providing each data partition to a selected one of a plurality of mapping functions to form a second intermediate data set having a second set of resulting key-value pairs differing from the first set of resulting key-value pairs , wherein the first schema differs from the second schema , wherein the first and second schema and the first and second set of resulting key-value pairs have a key in common ;

and reducing the first and the second intermediate data set together so as to form an output data set , wherein the reducing is accomplished by iterating on at least one key from the first and second intermediate data set , and the output data set has a different schema than the first and second schema , wherein the mapping and reducing operations are performed by a distributed system .
JP2006155663A
CLAIM 1
関係データベース管理システムを動かすためのコンピュ (processing data) ータを使用してビットストリングにブール演算を施して結果のビットストリングを形成するためのステップを具備する方法であって、ビットストリングは関係データベース管理システムにおける組織化されたデータを表わし、結果のビットストリングはユーザによるクエリーの条件を満足する関係データベース管理システムにおけるレコードを表わし、各ビットストリングは入力ビットスライスに分割され、結果のビットストリングは結果のビットスライスに分割され、コンピュータを使用するステップはさらに、関係データベース管理システムにおける組織化されたデータを表わす第1のビットストリングからの第1の入力ビットスライスと関係データベース管理システムにおける組織化されたデータを表わす第2のビットストリングからの第2の入力ビットスライスとに基づきブール演算による作用を決定し、 第1の入力ビットスライスと第2の入力ビットスライスとの中からより長いビット長の入力ビットスライスを選択し、 より長いビット長を有するビットストリングのビット数まで、より長い入力ビットスライスと、より短かいビット長の入力ビットスライスを有するビットストリングの複数の入力ビットスライスとを、決定された作用に従って処理してユーザのクエリーの条件を満足する結果のビットストリングについて少なくとも1つの結果のビットスライスを形成する各ステップを具備する方法。

JP2006155663A
CLAIM 19
複数のビットストリングを格納するメモリ (different schema) であってその各々は関係データベース管理システムにおける組織化されたデータを表わし各々が入力ビットスライスに分割されるものを具備する関係データベース管理システムと、 ビットストリングの1つにブール演算を施して結果のビットストリングを形成する手段であって、該結果のビットストリングはユーザのクエリーの条件を満足する関係データベース管理システムにおけるレコードを表わし、結果のビットストリングは結果のビットスライスに分割されるものと、 関係データベース管理システムにおける組織化されたデータを表わす第1のビットストリングからの第1の入力ビットスライスと関係データベース管理システムにおける組織化されたデータを表わす第2のビットストリングからの第2の入力ビットスライスとに基づきブール演算による作用を決定する手段と、 第1の入力ビットスライスと第2の入力ビットスライスとの中からより長いビット長の入力ビットスライスを選択する手段と、 より長いビット長を有するビットストリングのビット数まで、より長い入力ビットスライスと、より短かいビット長の入力ビットスライスを有するビットストリングの複数の入力ビットスライスとを、決定された作用に従って処理してユーザのクエリーの条件を満足する結果のビットストリングについて少なくとも1つの結果のビットスライスを形成する手段とを具備するシステム。

US8190610B2
CLAIM 40
. A computer system including a plurality of computing devices , the computer system configured to process data of a data set , wherein the data set comprises a plurality of data groups , the computer system comprises at least one processor and memory that are operable to perform the following operations : for a first data set having a plurality of first key-value pairs , wherein such first data set belongs to a first data group and the first key-value pairs have a first schema , partitioning the first data set into a plurality of data partitions and providing each data partition to a selected one of a plurality of mapping functions that are each user-configurable to independently output a plurality of lists of values for each of a set of keys found in such map function' ;

s corresponding data partition to form a first intermediate data set having a first set of resulting key-value pairs ;

for a second data set having a plurality of second key-value pairs , wherein such second data set belongs to a second data group and the second key-value pairs have a second schema , partitioning the first data set into a plurality of data partitions and providing each data partition to a selected one of a plurality of mapping functions to form a second intermediate data set having a second set of resulting key-value pairs differing from the first set of resulting key-value pairs , wherein the first schema differs from the second schema ;

and reducing the first and the second intermediate data set together so as to form an output data set , wherein the reducing is accomplished by iterating on at least one key from the first and second intermediate data set , and the output data set has a different schema (メモリ) than the first and second schema , wherein the first and second schema and the first and second set of resulting key-value pairs have a key in common .
JP2006155663A
CLAIM 19
複数のビットストリングを格納するメモリ (different schema) であってその各々は関係データベース管理システムにおける組織化されたデータを表わし各々が入力ビットスライスに分割されるものを具備する関係データベース管理システムと、 ビットストリングの1つにブール演算を施して結果のビットストリングを形成する手段であって、該結果のビットストリングはユーザのクエリーの条件を満足する関係データベース管理システムにおけるレコードを表わし、結果のビットストリングは結果のビットスライスに分割されるものと、 関係データベース管理システムにおける組織化されたデータを表わす第1のビットストリングからの第1の入力ビットスライスと関係データベース管理システムにおける組織化されたデータを表わす第2のビットストリングからの第2の入力ビットスライスとに基づきブール演算による作用を決定する手段と、 第1の入力ビットスライスと第2の入力ビットスライスとの中からより長いビット長の入力ビットスライスを選択する手段と、 より長いビット長を有するビットストリングのビット数まで、より長い入力ビットスライスと、より短かいビット長の入力ビットスライスを有するビットストリングの複数の入力ビットスライスとを、決定された作用に従って処理してユーザのクエリーの条件を満足する結果のビットストリングについて少なくとも1つの結果のビットスライスを形成する手段とを具備するシステム。




US8190610B2

Filed: 2006-10-05     Issued: 2012-05-29

MapReduce for distributed database processing

(Original Assignee) Yahoo Inc     (Current Assignee) R2 Solutions LLC ; Altaba Inc

Ali Dasdan, Hung-Chih Yang, Ruey-Lung Hsiao
US20060200253A1

Filed: 2006-02-27     Issued: 2006-09-07

Internet appliance system and method

(Original Assignee) STEVEN M HOFFBERG 2004-1 GRAT     (Current Assignee) HOFFBERG FAMILY TRUST 1 ; STEVEN M HOFFBERG 2004-1 GRAT ; Blanding Hovenweep LLC

Steven Hoffberg, Linda Hoffberg-Borghesani
US8190610B2
CLAIM 1
. A method of processing data of a data set (one second, first data, one packet) over a distributed system , wherein the data set comprises a plurality of data groups (automated device) , the method comprising : partitioning the data of each one of the data groups into a plurality of data partitions that each have a plurality of key-value pairs and providing each data partition to a selected one of a plurality of mapping functions that are each user-configurable to independently output a plurality of lists of values for each of a set of keys found in such map function' ;

s corresponding data partition to form corresponding intermediate data for that data group and identifiable to that data group , wherein the data of a first data (one second, first data, one packet) group has a different schema than the data of a second data (signal processor, digital audio, second data, remote device) group and the data of the first data group is mapped differently than the data of the second data group so that different lists of values are output for the corresponding different intermediate data , wherein the different schema and corresponding different intermediate data have a key in common ;

and reducing the intermediate data for the data groups to at least one output data group , including processing the intermediate data for each data group in a manner that is defined to correspond to that data group , so as to result in a merging of the corresponding different intermediate data based on the key in common , wherein the mapping and reducing operations are performed by a distributed system .
US20060200253A1
CLAIM 133
. An Internet appliance , comprising , within a single housing : (a) at least one first packet data network interface , adapted for communicating with the Internet ;
(b) at least one second (first data, first data set, data set) packet data network interface , adapted for communicating with a local area network ;
(c) at least one data interface selected from the group consisting of a universal serial bus (USB) , an IEEE-1394 interface , a voice telephony interface , an audio program interface , a video program interface , an audiovisual program interface , a camera interface , a physical security system interface , a wireless networking interface ;
a device control interface , smart home interface , an environmental sensing interface , and an environmental control interface ;
(d) at least one memory ;
and (e) a processor , for executing code stored in said at least one memory for causing said processor to control a data transfer between said local area network and the Internet , and defining a markup language interface communicated through at least one of said first and second packet data network interfaces to at least one of (i) control a data transfer ;
and (ii) control a remote device (second data, second data group, output data groups, second data set, second set) , through said at least one data interface .

US20060200253A1
CLAIM 142
. The Internet appliance according to claim 133 , wherein said processor comprises a digital signal processor (second data, second data group, output data groups, second data set, second set) .

US20060200253A1
CLAIM 148
. The Internet appliance according to claim 133 , wherein said processor automatically communicates with an automated device (data groups) using the markup language interface .

US20060200253A1
CLAIM 150
. The Internet appliance according to claim 133 , further comprising a codec for interconverting media data between a first data (first data, first data set, data set) format and a second data (second data, second data group, output data groups, second data set, second set) format .

US20060200253A1
CLAIM 157
. The method according to claim 156 , further comprising decoding at least one of a digital video signal and a digital audio (second data, second data group, output data groups, second data set, second set) signal for presentation to the at least one of a video output and an audio output .

US20060200253A1
CLAIM 173
. A method , comprising : (a) providing at least one packet (first data, first data set, data set) data network interface , adapted for bi-directionally communicating data packets according to an Internet Protocol with a public network , at least one packet data network interface , adapted for bi-directionally communicating data packets according to an Internet Protocol with a private network , and a computer telephony interface ;
and (b) defining at least a remote virtual interface function , a data packet routing function , and a voice communication processing function for controlling the computer telephony interface .

US8190610B2
CLAIM 2
. The method of claim 1 , wherein : the at least one output data group is a plurality of output data groups (automated device) (signal processor, digital audio, second data, remote device) .
US20060200253A1
CLAIM 133
. An Internet appliance , comprising , within a single housing : (a) at least one first packet data network interface , adapted for communicating with the Internet ;
(b) at least one second packet data network interface , adapted for communicating with a local area network ;
(c) at least one data interface selected from the group consisting of a universal serial bus (USB) , an IEEE-1394 interface , a voice telephony interface , an audio program interface , a video program interface , an audiovisual program interface , a camera interface , a physical security system interface , a wireless networking interface ;
a device control interface , smart home interface , an environmental sensing interface , and an environmental control interface ;
(d) at least one memory ;
and (e) a processor , for executing code stored in said at least one memory for causing said processor to control a data transfer between said local area network and the Internet , and defining a markup language interface communicated through at least one of said first and second packet data network interfaces to at least one of (i) control a data transfer ;
and (ii) control a remote device (second data, second data group, output data groups, second data set, second set) , through said at least one data interface .

US20060200253A1
CLAIM 142
. The Internet appliance according to claim 133 , wherein said processor comprises a digital signal processor (second data, second data group, output data groups, second data set, second set) .

US20060200253A1
CLAIM 148
. The Internet appliance according to claim 133 , wherein said processor automatically communicates with an automated device (data groups) using the markup language interface .

US20060200253A1
CLAIM 150
. The Internet appliance according to claim 133 , further comprising a codec for interconverting media data between a first data format and a second data (second data, second data group, output data groups, second data set, second set) format .

US20060200253A1
CLAIM 157
. The method according to claim 156 , further comprising decoding at least one of a digital video signal and a digital audio (second data, second data group, output data groups, second data set, second set) signal for presentation to the at least one of a video output and an audio output .

US8190610B2
CLAIM 14
. The method of claim 13 , wherein : the reducing step is carried out by a plurality of reducers ;

and the data that is not intermediate data , for a particular reducer , includes data (real time) that is associated with another reducer .
US20060200253A1
CLAIM 146
. The Internet appliance according to claim 133 , wherein said processor executes a real time (includes data) operating system .

US8190610B2
CLAIM 15
. The method of claim 13 , wherein : the reducing step is carried out by a plurality of reducers ;

and the data that is not intermediate data , for a particular reducer , includes data (real time) that is associated with that reducer .
US20060200253A1
CLAIM 146
. The Internet appliance according to claim 133 , wherein said processor executes a real time (includes data) operating system .

US8190610B2
CLAIM 16
. The method of claim 1 wherein the reducing step includes relating the data among the plurality of data groups (automated device) .
US20060200253A1
CLAIM 148
. The Internet appliance according to claim 133 , wherein said processor automatically communicates with an automated device (data groups) using the markup language interface .

US8190610B2
CLAIM 17
. A computer system including a plurality of computing devices (physical security) , the computer system configured to process data of a data set (one second, first data, one packet) , wherein the data set comprises a plurality of data groups (automated device) , the computer system comprises at least one processor and memory that are operable to perform the following operations : partitioning the data of each one of the data groups into a plurality of data partitions that each have a plurality of key-value pairs and providing each data partition to a selected one of a plurality of mapping functions that are each user-configurable to independently output a plurality of lists of values for each of a set of keys found in such map function' ;

s corresponding data partition to form corresponding intermediate data for that data group and identifiable to that data group , wherein the data of a first data (one second, first data, one packet) group has a different schema than the data of a second data (signal processor, digital audio, second data, remote device) group and the data of the first data group is mapped differently than the data of the second data group so that different lists of values are output for the corresponding different intermediate data , wherein the different schema and corresponding different intermediate data have a key in common ;

and reduce the intermediate data for the data groups to at least one output data group , including processing the intermediate data for each data group in a manner that is defined to correspond to that data group so as to result in a merging of the corresponding different intermediate data based on the key in common .
US20060200253A1
CLAIM 133
. An Internet appliance , comprising , within a single housing : (a) at least one first packet data network interface , adapted for communicating with the Internet ;
(b) at least one second (first data, first data set, data set) packet data network interface , adapted for communicating with a local area network ;
(c) at least one data interface selected from the group consisting of a universal serial bus (USB) , an IEEE-1394 interface , a voice telephony interface , an audio program interface , a video program interface , an audiovisual program interface , a camera interface , a physical security (computing devices) system interface , a wireless networking interface ;
a device control interface , smart home interface , an environmental sensing interface , and an environmental control interface ;
(d) at least one memory ;
and (e) a processor , for executing code stored in said at least one memory for causing said processor to control a data transfer between said local area network and the Internet , and defining a markup language interface communicated through at least one of said first and second packet data network interfaces to at least one of (i) control a data transfer ;
and (ii) control a remote device (second data, second data group, output data groups, second data set, second set) , through said at least one data interface .

US20060200253A1
CLAIM 142
. The Internet appliance according to claim 133 , wherein said processor comprises a digital signal processor (second data, second data group, output data groups, second data set, second set) .

US20060200253A1
CLAIM 148
. The Internet appliance according to claim 133 , wherein said processor automatically communicates with an automated device (data groups) using the markup language interface .

US20060200253A1
CLAIM 150
. The Internet appliance according to claim 133 , further comprising a codec for interconverting media data between a first data (first data, first data set, data set) format and a second data (second data, second data group, output data groups, second data set, second set) format .

US20060200253A1
CLAIM 157
. The method according to claim 156 , further comprising decoding at least one of a digital video signal and a digital audio (second data, second data group, output data groups, second data set, second set) signal for presentation to the at least one of a video output and an audio output .

US20060200253A1
CLAIM 173
. A method , comprising : (a) providing at least one packet (first data, first data set, data set) data network interface , adapted for bi-directionally communicating data packets according to an Internet Protocol with a public network , at least one packet data network interface , adapted for bi-directionally communicating data packets according to an Internet Protocol with a private network , and a computer telephony interface ;
and (b) defining at least a remote virtual interface function , a data packet routing function , and a voice communication processing function for controlling the computer telephony interface .

US8190610B2
CLAIM 18
. The computer system of claim 17 , wherein : the at least one output data group is a plurality of output data groups (automated device) (signal processor, digital audio, second data, remote device) .
US20060200253A1
CLAIM 133
. An Internet appliance , comprising , within a single housing : (a) at least one first packet data network interface , adapted for communicating with the Internet ;
(b) at least one second packet data network interface , adapted for communicating with a local area network ;
(c) at least one data interface selected from the group consisting of a universal serial bus (USB) , an IEEE-1394 interface , a voice telephony interface , an audio program interface , a video program interface , an audiovisual program interface , a camera interface , a physical security system interface , a wireless networking interface ;
a device control interface , smart home interface , an environmental sensing interface , and an environmental control interface ;
(d) at least one memory ;
and (e) a processor , for executing code stored in said at least one memory for causing said processor to control a data transfer between said local area network and the Internet , and defining a markup language interface communicated through at least one of said first and second packet data network interfaces to at least one of (i) control a data transfer ;
and (ii) control a remote device (second data, second data group, output data groups, second data set, second set) , through said at least one data interface .

US20060200253A1
CLAIM 142
. The Internet appliance according to claim 133 , wherein said processor comprises a digital signal processor (second data, second data group, output data groups, second data set, second set) .

US20060200253A1
CLAIM 148
. The Internet appliance according to claim 133 , wherein said processor automatically communicates with an automated device (data groups) using the markup language interface .

US20060200253A1
CLAIM 150
. The Internet appliance according to claim 133 , further comprising a codec for interconverting media data between a first data format and a second data (second data, second data group, output data groups, second data set, second set) format .

US20060200253A1
CLAIM 157
. The method according to claim 156 , further comprising decoding at least one of a digital video signal and a digital audio (second data, second data group, output data groups, second data set, second set) signal for presentation to the at least one of a video output and an audio output .

US8190610B2
CLAIM 30
. The computer system of claim 29 , wherein : the at least one processor and memory are further operable to reduce the intermediate data via a plurality of reducers ;

and the data that is not intermediate data , for a particular reducer , includes data (real time) that is associated with another reducer .
US20060200253A1
CLAIM 146
. The Internet appliance according to claim 133 , wherein said processor executes a real time (includes data) operating system .

US8190610B2
CLAIM 31
. The computer system of claim 29 , wherein : the at least one processor and memory are further operable to reduce the intermediate data via a plurality of reducers ;

and the data that is not intermediate data , for a particular reducer , includes data (real time) that is associated with that reducer .
US20060200253A1
CLAIM 146
. The Internet appliance according to claim 133 , wherein said processor executes a real time (includes data) operating system .

US8190610B2
CLAIM 32
. The computer system of claim 17 , wherein the at least one processor and memory are further operable to reduce by relating the data among the plurality of data groups (automated device) .
US20060200253A1
CLAIM 148
. The Internet appliance according to claim 133 , wherein said processor automatically communicates with an automated device (data groups) using the markup language interface .

US8190610B2
CLAIM 33
. A map-reduce method of processing data from a plurality of groups having different schema over a computer system , the method comprising : for a first data (one second, first data, one packet) set having a plurality of first key-value pairs , wherein such first data set belongs to a first data group and the first key-value pairs have a first schema , partitioning the first data set into a plurality of data partitions and providing each data partition to a selected one of a plurality of mapping functions that are each user-configurable to independently output a plurality of lists of values for each of a set of keys found in such map function' ;

s corresponding data partition to form a first intermediate data set (one second, first data, one packet) having a first set of resulting key-value pairs ;

for a second data (signal processor, digital audio, second data, remote device) set having a plurality of second key-value pairs , wherein such second data set belongs to a second data group and the second key-value pairs have a second schema , partitioning the first data set into a plurality of data partitions and providing each data partition to a selected one of a plurality of mapping functions to form a second intermediate data set having a second set (signal processor, digital audio, second data, remote device) of resulting key-value pairs differing from the first set of resulting key-value pairs , wherein the first schema differs from the second schema , wherein the first and second schema and the first and second set of resulting key-value pairs have a key in common ;

and reducing the first and the second intermediate data set together so as to form an output data set , wherein the reducing is accomplished by iterating on at least one key from the first and second intermediate data set , and the output data set has a different schema than the first and second schema , wherein the mapping and reducing operations are performed by a distributed system .
US20060200253A1
CLAIM 133
. An Internet appliance , comprising , within a single housing : (a) at least one first packet data network interface , adapted for communicating with the Internet ;
(b) at least one second (first data, first data set, data set) packet data network interface , adapted for communicating with a local area network ;
(c) at least one data interface selected from the group consisting of a universal serial bus (USB) , an IEEE-1394 interface , a voice telephony interface , an audio program interface , a video program interface , an audiovisual program interface , a camera interface , a physical security system interface , a wireless networking interface ;
a device control interface , smart home interface , an environmental sensing interface , and an environmental control interface ;
(d) at least one memory ;
and (e) a processor , for executing code stored in said at least one memory for causing said processor to control a data transfer between said local area network and the Internet , and defining a markup language interface communicated through at least one of said first and second packet data network interfaces to at least one of (i) control a data transfer ;
and (ii) control a remote device (second data, second data group, output data groups, second data set, second set) , through said at least one data interface .

US20060200253A1
CLAIM 142
. The Internet appliance according to claim 133 , wherein said processor comprises a digital signal processor (second data, second data group, output data groups, second data set, second set) .

US20060200253A1
CLAIM 150
. The Internet appliance according to claim 133 , further comprising a codec for interconverting media data between a first data (first data, first data set, data set) format and a second data (second data, second data group, output data groups, second data set, second set) format .

US20060200253A1
CLAIM 157
. The method according to claim 156 , further comprising decoding at least one of a digital video signal and a digital audio (second data, second data group, output data groups, second data set, second set) signal for presentation to the at least one of a video output and an audio output .

US20060200253A1
CLAIM 173
. A method , comprising : (a) providing at least one packet (first data, first data set, data set) data network interface , adapted for bi-directionally communicating data packets according to an Internet Protocol with a public network , at least one packet data network interface , adapted for bi-directionally communicating data packets according to an Internet Protocol with a private network , and a computer telephony interface ;
and (b) defining at least a remote virtual interface function , a data packet routing function , and a voice communication processing function for controlling the computer telephony interface .

US8190610B2
CLAIM 34
. The map-reduce method of claim 33 , wherein the reducing is accomplished by iterating on at least one key from each of the first and second intermediate data set (one second, first data, one packet) so that the output data set is a merging of a portion of the first and second intermediate data set .
US20060200253A1
CLAIM 133
. An Internet appliance , comprising , within a single housing : (a) at least one first packet data network interface , adapted for communicating with the Internet ;
(b) at least one second (first data, first data set, data set) packet data network interface , adapted for communicating with a local area network ;
(c) at least one data interface selected from the group consisting of a universal serial bus (USB) , an IEEE-1394 interface , a voice telephony interface , an audio program interface , a video program interface , an audiovisual program interface , a camera interface , a physical security system interface , a wireless networking interface ;
a device control interface , smart home interface , an environmental sensing interface , and an environmental control interface ;
(d) at least one memory ;
and (e) a processor , for executing code stored in said at least one memory for causing said processor to control a data transfer between said local area network and the Internet , and defining a markup language interface communicated through at least one of said first and second packet data network interfaces to at least one of (i) control a data transfer ;
and (ii) control a remote device , through said at least one data interface .

US20060200253A1
CLAIM 150
. The Internet appliance according to claim 133 , further comprising a codec for interconverting media data between a first data (first data, first data set, data set) format and a second data format .

US20060200253A1
CLAIM 173
. A method , comprising : (a) providing at least one packet (first data, first data set, data set) data network interface , adapted for bi-directionally communicating data packets according to an Internet Protocol with a public network , at least one packet data network interface , adapted for bi-directionally communicating data packets according to an Internet Protocol with a private network , and a computer telephony interface ;
and (b) defining at least a remote virtual interface function , a data packet routing function , and a voice communication processing function for controlling the computer telephony interface .

US8190610B2
CLAIM 35
. The map-reduce method of claim 33 , wherein the reducing is accomplished by a plurality of reducers , the method further comprises : partitioning the first or second intermediate data set (one second, first data, one packet) into a plurality of partitions , at least some of the key-value pairs of the first or second intermediate data being provided to more than one of the partitions ;

and providing the partitioned intermediate data set of each partition to a separate one of the reducers .
US20060200253A1
CLAIM 133
. An Internet appliance , comprising , within a single housing : (a) at least one first packet data network interface , adapted for communicating with the Internet ;
(b) at least one second (first data, first data set, data set) packet data network interface , adapted for communicating with a local area network ;
(c) at least one data interface selected from the group consisting of a universal serial bus (USB) , an IEEE-1394 interface , a voice telephony interface , an audio program interface , a video program interface , an audiovisual program interface , a camera interface , a physical security system interface , a wireless networking interface ;
a device control interface , smart home interface , an environmental sensing interface , and an environmental control interface ;
(d) at least one memory ;
and (e) a processor , for executing code stored in said at least one memory for causing said processor to control a data transfer between said local area network and the Internet , and defining a markup language interface communicated through at least one of said first and second packet data network interfaces to at least one of (i) control a data transfer ;
and (ii) control a remote device , through said at least one data interface .

US20060200253A1
CLAIM 150
. The Internet appliance according to claim 133 , further comprising a codec for interconverting media data between a first data (first data, first data set, data set) format and a second data format .

US20060200253A1
CLAIM 173
. A method , comprising : (a) providing at least one packet (first data, first data set, data set) data network interface , adapted for bi-directionally communicating data packets according to an Internet Protocol with a public network , at least one packet data network interface , adapted for bi-directionally communicating data packets according to an Internet Protocol with a private network , and a computer telephony interface ;
and (b) defining at least a remote virtual interface function , a data packet routing function , and a voice communication processing function for controlling the computer telephony interface .

US8190610B2
CLAIM 36
. The map-reduce method of claim 35 , wherein all of the key-value pairs of the first and second intermediate data set (one second, first data, one packet) are provided to all of the reducers .
US20060200253A1
CLAIM 133
. An Internet appliance , comprising , within a single housing : (a) at least one first packet data network interface , adapted for communicating with the Internet ;
(b) at least one second (first data, first data set, data set) packet data network interface , adapted for communicating with a local area network ;
(c) at least one data interface selected from the group consisting of a universal serial bus (USB) , an IEEE-1394 interface , a voice telephony interface , an audio program interface , a video program interface , an audiovisual program interface , a camera interface , a physical security system interface , a wireless networking interface ;
a device control interface , smart home interface , an environmental sensing interface , and an environmental control interface ;
(d) at least one memory ;
and (e) a processor , for executing code stored in said at least one memory for causing said processor to control a data transfer between said local area network and the Internet , and defining a markup language interface communicated through at least one of said first and second packet data network interfaces to at least one of (i) control a data transfer ;
and (ii) control a remote device , through said at least one data interface .

US20060200253A1
CLAIM 150
. The Internet appliance according to claim 133 , further comprising a codec for interconverting media data between a first data (first data, first data set, data set) format and a second data format .

US20060200253A1
CLAIM 173
. A method , comprising : (a) providing at least one packet (first data, first data set, data set) data network interface , adapted for bi-directionally communicating data packets according to an Internet Protocol with a public network , at least one packet data network interface , adapted for bi-directionally communicating data packets according to an Internet Protocol with a private network , and a computer telephony interface ;
and (b) defining at least a remote virtual interface function , a data packet routing function , and a voice communication processing function for controlling the computer telephony interface .

US8190610B2
CLAIM 40
. A computer system including a plurality of computing devices (physical security) , the computer system configured to process data of a data set (one second, first data, one packet) , wherein the data set comprises a plurality of data groups (automated device) , the computer system comprises at least one processor and memory that are operable to perform the following operations : for a first data (one second, first data, one packet) set having a plurality of first key-value pairs , wherein such first data set belongs to a first data group and the first key-value pairs have a first schema , partitioning the first data set into a plurality of data partitions and providing each data partition to a selected one of a plurality of mapping functions that are each user-configurable to independently output a plurality of lists of values for each of a set of keys found in such map function' ;

s corresponding data partition to form a first intermediate data set having a first set of resulting key-value pairs ;

for a second data (signal processor, digital audio, second data, remote device) set having a plurality of second key-value pairs , wherein such second data set belongs to a second data group and the second key-value pairs have a second schema , partitioning the first data set into a plurality of data partitions and providing each data partition to a selected one of a plurality of mapping functions to form a second intermediate data set having a second set (signal processor, digital audio, second data, remote device) of resulting key-value pairs differing from the first set of resulting key-value pairs , wherein the first schema differs from the second schema ;

and reducing the first and the second intermediate data set together so as to form an output data set , wherein the reducing is accomplished by iterating on at least one key from the first and second intermediate data set , and the output data set has a different schema than the first and second schema , wherein the first and second schema and the first and second set of resulting key-value pairs have a key in common .
US20060200253A1
CLAIM 133
. An Internet appliance , comprising , within a single housing : (a) at least one first packet data network interface , adapted for communicating with the Internet ;
(b) at least one second (first data, first data set, data set) packet data network interface , adapted for communicating with a local area network ;
(c) at least one data interface selected from the group consisting of a universal serial bus (USB) , an IEEE-1394 interface , a voice telephony interface , an audio program interface , a video program interface , an audiovisual program interface , a camera interface , a physical security (computing devices) system interface , a wireless networking interface ;
a device control interface , smart home interface , an environmental sensing interface , and an environmental control interface ;
(d) at least one memory ;
and (e) a processor , for executing code stored in said at least one memory for causing said processor to control a data transfer between said local area network and the Internet , and defining a markup language interface communicated through at least one of said first and second packet data network interfaces to at least one of (i) control a data transfer ;
and (ii) control a remote device (second data, second data group, output data groups, second data set, second set) , through said at least one data interface .

US20060200253A1
CLAIM 142
. The Internet appliance according to claim 133 , wherein said processor comprises a digital signal processor (second data, second data group, output data groups, second data set, second set) .

US20060200253A1
CLAIM 148
. The Internet appliance according to claim 133 , wherein said processor automatically communicates with an automated device (data groups) using the markup language interface .

US20060200253A1
CLAIM 150
. The Internet appliance according to claim 133 , further comprising a codec for interconverting media data between a first data (first data, first data set, data set) format and a second data (second data, second data group, output data groups, second data set, second set) format .

US20060200253A1
CLAIM 157
. The method according to claim 156 , further comprising decoding at least one of a digital video signal and a digital audio (second data, second data group, output data groups, second data set, second set) signal for presentation to the at least one of a video output and an audio output .

US20060200253A1
CLAIM 173
. A method , comprising : (a) providing at least one packet (first data, first data set, data set) data network interface , adapted for bi-directionally communicating data packets according to an Internet Protocol with a public network , at least one packet data network interface , adapted for bi-directionally communicating data packets according to an Internet Protocol with a private network , and a computer telephony interface ;
and (b) defining at least a remote virtual interface function , a data packet routing function , and a voice communication processing function for controlling the computer telephony interface .

US8190610B2
CLAIM 41
. The computer system of claim 40 , wherein the reducing is accomplished by iterating on at least one key from each of the first and second intermediate data set (one second, first data, one packet) so that the output data set is a merging of a portion of the first and second intermediate data set .
US20060200253A1
CLAIM 133
. An Internet appliance , comprising , within a single housing : (a) at least one first packet data network interface , adapted for communicating with the Internet ;
(b) at least one second (first data, first data set, data set) packet data network interface , adapted for communicating with a local area network ;
(c) at least one data interface selected from the group consisting of a universal serial bus (USB) , an IEEE-1394 interface , a voice telephony interface , an audio program interface , a video program interface , an audiovisual program interface , a camera interface , a physical security system interface , a wireless networking interface ;
a device control interface , smart home interface , an environmental sensing interface , and an environmental control interface ;
(d) at least one memory ;
and (e) a processor , for executing code stored in said at least one memory for causing said processor to control a data transfer between said local area network and the Internet , and defining a markup language interface communicated through at least one of said first and second packet data network interfaces to at least one of (i) control a data transfer ;
and (ii) control a remote device , through said at least one data interface .

US20060200253A1
CLAIM 150
. The Internet appliance according to claim 133 , further comprising a codec for interconverting media data between a first data (first data, first data set, data set) format and a second data format .

US20060200253A1
CLAIM 173
. A method , comprising : (a) providing at least one packet (first data, first data set, data set) data network interface , adapted for bi-directionally communicating data packets according to an Internet Protocol with a public network , at least one packet data network interface , adapted for bi-directionally communicating data packets according to an Internet Protocol with a private network , and a computer telephony interface ;
and (b) defining at least a remote virtual interface function , a data packet routing function , and a voice communication processing function for controlling the computer telephony interface .

US8190610B2
CLAIM 42
. The computer system of claim 40 , wherein the reducing is accomplished by a plurality of reducers , the d the at least one processor and memory further operable for : partitioning the first or second intermediate data set (one second, first data, one packet) into a plurality of partitions , at least some of the key-value pairs of the first or second intermediate data being provided to more than one of the partitions ;

and providing the partitioned intermediate data set of each partition to a separate one of the reducers .
US20060200253A1
CLAIM 133
. An Internet appliance , comprising , within a single housing : (a) at least one first packet data network interface , adapted for communicating with the Internet ;
(b) at least one second (first data, first data set, data set) packet data network interface , adapted for communicating with a local area network ;
(c) at least one data interface selected from the group consisting of a universal serial bus (USB) , an IEEE-1394 interface , a voice telephony interface , an audio program interface , a video program interface , an audiovisual program interface , a camera interface , a physical security system interface , a wireless networking interface ;
a device control interface , smart home interface , an environmental sensing interface , and an environmental control interface ;
(d) at least one memory ;
and (e) a processor , for executing code stored in said at least one memory for causing said processor to control a data transfer between said local area network and the Internet , and defining a markup language interface communicated through at least one of said first and second packet data network interfaces to at least one of (i) control a data transfer ;
and (ii) control a remote device , through said at least one data interface .

US20060200253A1
CLAIM 150
. The Internet appliance according to claim 133 , further comprising a codec for interconverting media data between a first data (first data, first data set, data set) format and a second data format .

US20060200253A1
CLAIM 173
. A method , comprising : (a) providing at least one packet (first data, first data set, data set) data network interface , adapted for bi-directionally communicating data packets according to an Internet Protocol with a public network , at least one packet data network interface , adapted for bi-directionally communicating data packets according to an Internet Protocol with a private network , and a computer telephony interface ;
and (b) defining at least a remote virtual interface function , a data packet routing function , and a voice communication processing function for controlling the computer telephony interface .

US8190610B2
CLAIM 43
. The computer system of claim 42 , wherein all of the key-value pairs of the first and second intermediate data set (one second, first data, one packet) are provided to all of the reducers .
US20060200253A1
CLAIM 133
. An Internet appliance , comprising , within a single housing : (a) at least one first packet data network interface , adapted for communicating with the Internet ;
(b) at least one second (first data, first data set, data set) packet data network interface , adapted for communicating with a local area network ;
(c) at least one data interface selected from the group consisting of a universal serial bus (USB) , an IEEE-1394 interface , a voice telephony interface , an audio program interface , a video program interface , an audiovisual program interface , a camera interface , a physical security system interface , a wireless networking interface ;
a device control interface , smart home interface , an environmental sensing interface , and an environmental control interface ;
(d) at least one memory ;
and (e) a processor , for executing code stored in said at least one memory for causing said processor to control a data transfer between said local area network and the Internet , and defining a markup language interface communicated through at least one of said first and second packet data network interfaces to at least one of (i) control a data transfer ;
and (ii) control a remote device , through said at least one data interface .

US20060200253A1
CLAIM 150
. The Internet appliance according to claim 133 , further comprising a codec for interconverting media data between a first data (first data, first data set, data set) format and a second data format .

US20060200253A1
CLAIM 173
. A method , comprising : (a) providing at least one packet (first data, first data set, data set) data network interface , adapted for bi-directionally communicating data packets according to an Internet Protocol with a public network , at least one packet data network interface , adapted for bi-directionally communicating data packets according to an Internet Protocol with a private network , and a computer telephony interface ;
and (b) defining at least a remote virtual interface function , a data packet routing function , and a voice communication processing function for controlling the computer telephony interface .




US8190610B2

Filed: 2006-10-05     Issued: 2012-05-29

MapReduce for distributed database processing

(Original Assignee) Yahoo Inc     (Current Assignee) R2 Solutions LLC ; Altaba Inc

Ali Dasdan, Hung-Chih Yang, Ruey-Lung Hsiao
US20060173926A1

Filed: 2006-02-27     Issued: 2006-08-03

Data transformation to maintain detailed user information in a data warehouse

(Original Assignee) Microsoft Corp     (Current Assignee) Microsoft Technology Licensing LLC

Kevin Kornelson, Murali Vajjiravel, Rajeev Prasad, Paul Clark, Brian Burdick, Tarek Najm
US8190610B2
CLAIM 1
. A method of processing data of a data set (said system) over a distributed system , wherein the data set comprises a plurality of data groups , the method comprising : partitioning the data of each one of the data groups into a plurality of data partitions that each have a plurality of key-value pairs and providing each data partition to a selected one of a plurality of mapping functions that are each user-configurable to independently output a plurality of lists of values for each of a set of keys found in such map function' ;

s corresponding data partition to form corresponding intermediate data for that data group and identifiable to that data group , wherein the data of a first data group has a different schema than the data of a second data group and the data of the first data group is mapped differently than the data of the second data group so that different lists of values are output for the corresponding different intermediate data , wherein the different schema and corresponding different intermediate data have a key in common ;

and reducing the intermediate data for the data groups to at least one output data group , including processing the intermediate data for each data group in a manner that is defined to correspond to that data group , so as to result in a merging of the corresponding different intermediate data based on the key in common , wherein the mapping and reducing operations are performed by a distributed system .
US20060173926A1
CLAIM 33
. A data collection and warehousing system receiving a plurality of individual log files from a plurality of servers , said log files each comprising a data record and at least one partition key value corresponding thereto , said system (data set, first data set, second data set) comprising : means for partitioning the received data records by assigning each of the data records to one of a plurality of partitions based on the partition key value corresponding to the data record , each of the partitions having one or more of the partition key values associated therewith ;
means for sorting the partitioned data records according to the corresponding partition key values and merging the sorted data records and corresponding partition key values with other data records and other corresponding partition key values , said other data records and other corresponding partition key values being previously received ;
and means for mapping each of the partition key values to another key value , said other key value representing a unit of information smaller than the partition key value associated with the merged data records .

US8190610B2
CLAIM 12
. The method of claim 5 , wherein : the intermediate data includes a plurality of grouped sets of key/value pairs ;

the reducing step is carried out by a plurality of reducers ;

for at least one of the reducers , the iterator corresponding to a particular data group , for that reducer , operates according to a different key (different key, enable access) of a different schema than the iterator corresponding to another particular data group , for that reducer .
US20060173926A1
CLAIM 6
. The computer-readable media of claim 25 , wherein the process management component further partitions the received data records by assigning each of the data records to one of the plurality of partitions based on a different key (different key, groups having different schema) value associated with said data record to enable access (different key, groups having different schema) to said data record via the different key value .

US8190610B2
CLAIM 14
. The method of claim 13 , wherein : the reducing step is carried out by a plurality of reducers ;

and the data that is not intermediate data , for a particular reducer , includes data (collection system) that is associated with another reducer .
US20060173926A1
CLAIM 38
. The computer-readable media of claim 25 , wherein the data management component loads the data records from the log files into the data warehousing and collection system (includes data) .

US8190610B2
CLAIM 15
. The method of claim 13 , wherein : the reducing step is carried out by a plurality of reducers ;

and the data that is not intermediate data , for a particular reducer , includes data (collection system) that is associated with that reducer .
US20060173926A1
CLAIM 38
. The computer-readable media of claim 25 , wherein the data management component loads the data records from the log files into the data warehousing and collection system (includes data) .

US8190610B2
CLAIM 17
. A computer system including a plurality of computing devices , the computer system configured to process data of a data set (said system) , wherein the data set comprises a plurality of data groups , the computer system comprises at least one processor and memory that are operable to perform the following operations : partitioning the data of each one of the data groups into a plurality of data partitions that each have a plurality of key-value pairs and providing each data partition to a selected one of a plurality of mapping functions that are each user-configurable to independently output a plurality of lists of values for each of a set of keys found in such map function' ;

s corresponding data partition to form corresponding intermediate data for that data group and identifiable to that data group , wherein the data of a first data group has a different schema than the data of a second data group and the data of the first data group is mapped differently than the data of the second data group so that different lists of values are output for the corresponding different intermediate data , wherein the different schema and corresponding different intermediate data have a key in common ;

and reduce the intermediate data for the data groups to at least one output data group , including processing the intermediate data for each data group in a manner that is defined to correspond to that data group so as to result in a merging of the corresponding different intermediate data based on the key in common .
US20060173926A1
CLAIM 33
. A data collection and warehousing system receiving a plurality of individual log files from a plurality of servers , said log files each comprising a data record and at least one partition key value corresponding thereto , said system (data set, first data set, second data set) comprising : means for partitioning the received data records by assigning each of the data records to one of a plurality of partitions based on the partition key value corresponding to the data record , each of the partitions having one or more of the partition key values associated therewith ;
means for sorting the partitioned data records according to the corresponding partition key values and merging the sorted data records and corresponding partition key values with other data records and other corresponding partition key values , said other data records and other corresponding partition key values being previously received ;
and means for mapping each of the partition key values to another key value , said other key value representing a unit of information smaller than the partition key value associated with the merged data records .

US8190610B2
CLAIM 22
. The computer system of claim 21 , wherein : the intermediate data includes a plurality of grouped sets of key/value pairs ;

the at least one processor and memory are further operable to reduce the intermediate data via a plurality of reducers ;

for at least one of the reducers , the iterator corresponding to a particular data group , for that reducer , is configured to operate according to a different key (different key, enable access) of a different schema than the iterator corresponding to another particular data group , for that reducer .
US20060173926A1
CLAIM 6
. The computer-readable media of claim 25 , wherein the process management component further partitions the received data records by assigning each of the data records to one of the plurality of partitions based on a different key (different key, groups having different schema) value associated with said data record to enable access (different key, groups having different schema) to said data record via the different key value .

US8190610B2
CLAIM 30
. The computer system of claim 29 , wherein : the at least one processor and memory are further operable to reduce the intermediate data via a plurality of reducers ;

and the data that is not intermediate data , for a particular reducer , includes data (collection system) that is associated with another reducer .
US20060173926A1
CLAIM 38
. The computer-readable media of claim 25 , wherein the data management component loads the data records from the log files into the data warehousing and collection system (includes data) .

US8190610B2
CLAIM 31
. The computer system of claim 29 , wherein : the at least one processor and memory are further operable to reduce the intermediate data via a plurality of reducers ;

and the data that is not intermediate data , for a particular reducer , includes data (collection system) that is associated with that reducer .
US20060173926A1
CLAIM 38
. The computer-readable media of claim 25 , wherein the data management component loads the data records from the log files into the data warehousing and collection system (includes data) .

US8190610B2
CLAIM 33
. A map-reduce method of processing data from a plurality of groups having different schema (different key, enable access) over a computer system , the method comprising : for a first data set (said system) having a plurality of first key-value pairs , wherein such first data set belongs to a first data group and the first key-value pairs have a first schema , partitioning the first data set into a plurality of data partitions and providing each data partition to a selected one of a plurality of mapping functions that are each user-configurable to independently output a plurality of lists of values for each of a set of keys found in such map function' ;

s corresponding data partition to form a first intermediate data set having a first set of resulting key-value pairs ;

for a second data set having a plurality of second key-value pairs , wherein such second data set belongs to a second data group and the second key-value pairs have a second schema , partitioning the first data set into a plurality of data partitions and providing each data partition to a selected one of a plurality of mapping functions to form a second intermediate data set having a second set of resulting key-value pairs differing from the first set of resulting key-value pairs , wherein the first schema differs from the second schema , wherein the first and second schema and the first and second set of resulting key-value pairs have a key in common ;

and reducing the first and the second intermediate data set together so as to form an output data set , wherein the reducing is accomplished by iterating on at least one key from the first and second intermediate data set , and the output data set has a different schema than the first and second schema , wherein the mapping and reducing operations are performed by a distributed system .
US20060173926A1
CLAIM 6
. The computer-readable media of claim 25 , wherein the process management component further partitions the received data records by assigning each of the data records to one of the plurality of partitions based on a different key (different key, groups having different schema) value associated with said data record to enable access (different key, groups having different schema) to said data record via the different key value .

US20060173926A1
CLAIM 33
. A data collection and warehousing system receiving a plurality of individual log files from a plurality of servers , said log files each comprising a data record and at least one partition key value corresponding thereto , said system (data set, first data set, second data set) comprising : means for partitioning the received data records by assigning each of the data records to one of a plurality of partitions based on the partition key value corresponding to the data record , each of the partitions having one or more of the partition key values associated therewith ;
means for sorting the partitioned data records according to the corresponding partition key values and merging the sorted data records and corresponding partition key values with other data records and other corresponding partition key values , said other data records and other corresponding partition key values being previously received ;
and means for mapping each of the partition key values to another key value , said other key value representing a unit of information smaller than the partition key value associated with the merged data records .

US8190610B2
CLAIM 34
. The map-reduce method of claim 33 , wherein the reducing is accomplished by iterating on at least one key from each of the first and second intermediate data set (said system) so that the output data set is a merging of a portion of the first and second intermediate data set .
US20060173926A1
CLAIM 33
. A data collection and warehousing system receiving a plurality of individual log files from a plurality of servers , said log files each comprising a data record and at least one partition key value corresponding thereto , said system (data set, first data set, second data set) comprising : means for partitioning the received data records by assigning each of the data records to one of a plurality of partitions based on the partition key value corresponding to the data record , each of the partitions having one or more of the partition key values associated therewith ;
means for sorting the partitioned data records according to the corresponding partition key values and merging the sorted data records and corresponding partition key values with other data records and other corresponding partition key values , said other data records and other corresponding partition key values being previously received ;
and means for mapping each of the partition key values to another key value , said other key value representing a unit of information smaller than the partition key value associated with the merged data records .

US8190610B2
CLAIM 35
. The map-reduce method of claim 33 , wherein the reducing is accomplished by a plurality of reducers , the method further comprises : partitioning the first or second intermediate data set (said system) into a plurality of partitions , at least some of the key-value pairs of the first or second intermediate data being provided to more than one of the partitions ;

and providing the partitioned intermediate data set of each partition to a separate one of the reducers .
US20060173926A1
CLAIM 33
. A data collection and warehousing system receiving a plurality of individual log files from a plurality of servers , said log files each comprising a data record and at least one partition key value corresponding thereto , said system (data set, first data set, second data set) comprising : means for partitioning the received data records by assigning each of the data records to one of a plurality of partitions based on the partition key value corresponding to the data record , each of the partitions having one or more of the partition key values associated therewith ;
means for sorting the partitioned data records according to the corresponding partition key values and merging the sorted data records and corresponding partition key values with other data records and other corresponding partition key values , said other data records and other corresponding partition key values being previously received ;
and means for mapping each of the partition key values to another key value , said other key value representing a unit of information smaller than the partition key value associated with the merged data records .

US8190610B2
CLAIM 36
. The map-reduce method of claim 35 , wherein all of the key-value pairs of the first and second intermediate data set (said system) are provided to all of the reducers .
US20060173926A1
CLAIM 33
. A data collection and warehousing system receiving a plurality of individual log files from a plurality of servers , said log files each comprising a data record and at least one partition key value corresponding thereto , said system (data set, first data set, second data set) comprising : means for partitioning the received data records by assigning each of the data records to one of a plurality of partitions based on the partition key value corresponding to the data record , each of the partitions having one or more of the partition key values associated therewith ;
means for sorting the partitioned data records according to the corresponding partition key values and merging the sorted data records and corresponding partition key values with other data records and other corresponding partition key values , said other data records and other corresponding partition key values being previously received ;
and means for mapping each of the partition key values to another key value , said other key value representing a unit of information smaller than the partition key value associated with the merged data records .

US8190610B2
CLAIM 40
. A computer system including a plurality of computing devices , the computer system configured to process data of a data set (said system) , wherein the data set comprises a plurality of data groups , the computer system comprises at least one processor and memory that are operable to perform the following operations : for a first data set having a plurality of first key-value pairs , wherein such first data set belongs to a first data group and the first key-value pairs have a first schema , partitioning the first data set into a plurality of data partitions and providing each data partition to a selected one of a plurality of mapping functions that are each user-configurable to independently output a plurality of lists of values for each of a set of keys found in such map function' ;

s corresponding data partition to form a first intermediate data set having a first set of resulting key-value pairs ;

for a second data set having a plurality of second key-value pairs , wherein such second data set belongs to a second data group and the second key-value pairs have a second schema , partitioning the first data set into a plurality of data partitions and providing each data partition to a selected one of a plurality of mapping functions to form a second intermediate data set having a second set of resulting key-value pairs differing from the first set of resulting key-value pairs , wherein the first schema differs from the second schema ;

and reducing the first and the second intermediate data set together so as to form an output data set , wherein the reducing is accomplished by iterating on at least one key from the first and second intermediate data set , and the output data set has a different schema than the first and second schema , wherein the first and second schema and the first and second set of resulting key-value pairs have a key in common .
US20060173926A1
CLAIM 33
. A data collection and warehousing system receiving a plurality of individual log files from a plurality of servers , said log files each comprising a data record and at least one partition key value corresponding thereto , said system (data set, first data set, second data set) comprising : means for partitioning the received data records by assigning each of the data records to one of a plurality of partitions based on the partition key value corresponding to the data record , each of the partitions having one or more of the partition key values associated therewith ;
means for sorting the partitioned data records according to the corresponding partition key values and merging the sorted data records and corresponding partition key values with other data records and other corresponding partition key values , said other data records and other corresponding partition key values being previously received ;
and means for mapping each of the partition key values to another key value , said other key value representing a unit of information smaller than the partition key value associated with the merged data records .

US8190610B2
CLAIM 41
. The computer system of claim 40 , wherein the reducing is accomplished by iterating on at least one key from each of the first and second intermediate data set (said system) so that the output data set is a merging of a portion of the first and second intermediate data set .
US20060173926A1
CLAIM 33
. A data collection and warehousing system receiving a plurality of individual log files from a plurality of servers , said log files each comprising a data record and at least one partition key value corresponding thereto , said system (data set, first data set, second data set) comprising : means for partitioning the received data records by assigning each of the data records to one of a plurality of partitions based on the partition key value corresponding to the data record , each of the partitions having one or more of the partition key values associated therewith ;
means for sorting the partitioned data records according to the corresponding partition key values and merging the sorted data records and corresponding partition key values with other data records and other corresponding partition key values , said other data records and other corresponding partition key values being previously received ;
and means for mapping each of the partition key values to another key value , said other key value representing a unit of information smaller than the partition key value associated with the merged data records .

US8190610B2
CLAIM 42
. The computer system of claim 40 , wherein the reducing is accomplished by a plurality of reducers , the d the at least one processor and memory further operable for : partitioning the first or second intermediate data set (said system) into a plurality of partitions , at least some of the key-value pairs of the first or second intermediate data being provided to more than one of the partitions ;

and providing the partitioned intermediate data set of each partition to a separate one of the reducers .
US20060173926A1
CLAIM 33
. A data collection and warehousing system receiving a plurality of individual log files from a plurality of servers , said log files each comprising a data record and at least one partition key value corresponding thereto , said system (data set, first data set, second data set) comprising : means for partitioning the received data records by assigning each of the data records to one of a plurality of partitions based on the partition key value corresponding to the data record , each of the partitions having one or more of the partition key values associated therewith ;
means for sorting the partitioned data records according to the corresponding partition key values and merging the sorted data records and corresponding partition key values with other data records and other corresponding partition key values , said other data records and other corresponding partition key values being previously received ;
and means for mapping each of the partition key values to another key value , said other key value representing a unit of information smaller than the partition key value associated with the merged data records .

US8190610B2
CLAIM 43
. The computer system of claim 42 , wherein all of the key-value pairs of the first and second intermediate data set (said system) are provided to all of the reducers .
US20060173926A1
CLAIM 33
. A data collection and warehousing system receiving a plurality of individual log files from a plurality of servers , said log files each comprising a data record and at least one partition key value corresponding thereto , said system (data set, first data set, second data set) comprising : means for partitioning the received data records by assigning each of the data records to one of a plurality of partitions based on the partition key value corresponding to the data record , each of the partitions having one or more of the partition key values associated therewith ;
means for sorting the partitioned data records according to the corresponding partition key values and merging the sorted data records and corresponding partition key values with other data records and other corresponding partition key values , said other data records and other corresponding partition key values being previously received ;
and means for mapping each of the partition key values to another key value , said other key value representing a unit of information smaller than the partition key value associated with the merged data records .




US8190610B2

Filed: 2006-10-05     Issued: 2012-05-29

MapReduce for distributed database processing

(Original Assignee) Yahoo Inc     (Current Assignee) R2 Solutions LLC ; Altaba Inc

Ali Dasdan, Hung-Chih Yang, Ruey-Lung Hsiao
US20060206507A1

Filed: 2006-02-16     Issued: 2006-09-14

Hierarchal data management

(Original Assignee) Dahbour Ziyad M     

Ziyad Dahbour
US8190610B2
CLAIM 1
. A method of processing data of a data set over a distributed system , wherein the data set comprises a plurality of data groups , the method comprising : partitioning the data of each one of the data groups into a plurality of data partitions (different partition) that each have a plurality of key-value pairs and providing each data partition to a selected one of a plurality of mapping functions that are each user-configurable to independently output a plurality of lists of values for each of a set of keys found in such map function' ;

s corresponding data partition to form corresponding intermediate data for that data group and identifiable to that data group , wherein the data of a first data (first data, time t) group has a different schema than the data of a second data group and the data of the first data group is mapped differently than the data of the second data group so that different lists of values are output for the corresponding different intermediate data , wherein the different schema and corresponding different intermediate data have a key in common ;

and reducing the intermediate data for the data groups to at least one output data group , including processing the intermediate data for each data group in a manner that is defined to correspond to that data group , so as to result in a merging of the corresponding different intermediate data based on the key in common , wherein the mapping and reducing operations are performed by a distributed system .
US20060206507A1
CLAIM 19
. The method of claim 11 , wherein the storage policy is configured to map data to different partition (data partitions) tables responsive to a date of the data .

US20060206507A1
CLAIM 20
. A system comprising : a first data (first data, first data group) base partition stored on a first storage device and configured to store data , the first data being within a first date range ;
a second database partition stored on a second storage device and configured to store second data , the second data being within a second date range , the first storage device having a faster physical access time t (first data, first data group) han the second storage device , the second date range being prior to the first date range ;
a global data table comprising the first database partition and the second database partition , the first database partition and the second database partition being transparent to a user ;
partition meta data including a logical partitioning key configured for determining if data should be stored in alternatively the first database partition or the second database partition , the logical partition key being further configured for controlling the visibility of the first data and the second data to a user ;
and a data management policy configured for using the first database partition and the second database partition to archive the second data without removing the second data from the global data table .

US8190610B2
CLAIM 9
. The method of claim 7 , wherein : at least some of the reducers include a sort , group-by-key and combine task ;

the method further comprises generating and providing metadata (meta data) for at least some of the mapping , partitioning , combining , grouping and sorting .
US20060206507A1
CLAIM 1
. A hierarchal data management system for a storage device , comprising : an entity relationship discover to generate meta data (providing metadata) from a business object ;
a file manager to create a partition based on said metadata ;
and a data mover to generate a logical partitioning key and to store the logical partitioning key in said metadata for said partition , said file manager including a data management policy to define a data class and a storage policy to map said data class to said storage device to form a partition table .

US8190610B2
CLAIM 17
. A computer system including a plurality of computing devices , the computer system configured to process data of a data set , wherein the data set comprises a plurality of data groups , the computer system comprises at least one processor and memory that are operable to perform the following operations : partitioning the data of each one of the data groups into a plurality of data partitions (different partition) that each have a plurality of key-value pairs and providing each data partition to a selected one of a plurality of mapping functions that are each user-configurable to independently output a plurality of lists of values for each of a set of keys found in such map function' ;

s corresponding data partition to form corresponding intermediate data for that data group and identifiable to that data group , wherein the data of a first data (first data, time t) group has a different schema than the data of a second data group and the data of the first data group is mapped differently than the data of the second data group so that different lists of values are output for the corresponding different intermediate data , wherein the different schema and corresponding different intermediate data have a key in common ;

and reduce the intermediate data for the data groups to at least one output data group , including processing the intermediate data for each data group in a manner that is defined to correspond to that data group so as to result in a merging of the corresponding different intermediate data based on the key in common .
US20060206507A1
CLAIM 19
. The method of claim 11 , wherein the storage policy is configured to map data to different partition (data partitions) tables responsive to a date of the data .

US20060206507A1
CLAIM 20
. A system comprising : a first data (first data, first data group) base partition stored on a first storage device and configured to store data , the first data being within a first date range ;
a second database partition stored on a second storage device and configured to store second data , the second data being within a second date range , the first storage device having a faster physical access time t (first data, first data group) han the second storage device , the second date range being prior to the first date range ;
a global data table comprising the first database partition and the second database partition , the first database partition and the second database partition being transparent to a user ;
partition meta data including a logical partitioning key configured for determining if data should be stored in alternatively the first database partition or the second database partition , the logical partition key being further configured for controlling the visibility of the first data and the second data to a user ;
and a data management policy configured for using the first database partition and the second database partition to archive the second data without removing the second data from the global data table .

US8190610B2
CLAIM 33
. A map-reduce method of processing data from a plurality of groups having different schema over a computer system , the method comprising : for a first data (first data, time t) set having a plurality of first key-value pairs , wherein such first data set belongs to a first data group and the first key-value pairs have a first schema , partitioning the first data set into a plurality of data partitions (different partition) and providing each data partition to a selected one of a plurality of mapping functions that are each user-configurable to independently output a plurality of lists of values for each of a set of keys found in such map function' ;

s corresponding data partition to form a first intermediate data set having a first set of resulting key-value pairs ;

for a second data set having a plurality of second key-value pairs , wherein such second data set belongs to a second data group and the second key-value pairs have a second schema , partitioning the first data set into a plurality of data partitions and providing each data partition to a selected one of a plurality of mapping functions to form a second intermediate data set having a second set of resulting key-value pairs differing from the first set of resulting key-value pairs , wherein the first schema differs from the second schema , wherein the first and second schema and the first and second set of resulting key-value pairs have a key in common ;

and reducing the first and the second intermediate data set together so as to form an output data set , wherein the reducing is accomplished by iterating on at least one key from the first and second intermediate data set , and the output data set has a different schema than the first and second schema , wherein the mapping and reducing operations are performed by a distributed system .
US20060206507A1
CLAIM 19
. The method of claim 11 , wherein the storage policy is configured to map data to different partition (data partitions) tables responsive to a date of the data .

US20060206507A1
CLAIM 20
. A system comprising : a first data (first data, first data group) base partition stored on a first storage device and configured to store data , the first data being within a first date range ;
a second database partition stored on a second storage device and configured to store second data , the second data being within a second date range , the first storage device having a faster physical access time t (first data, first data group) han the second storage device , the second date range being prior to the first date range ;
a global data table comprising the first database partition and the second database partition , the first database partition and the second database partition being transparent to a user ;
partition meta data including a logical partitioning key configured for determining if data should be stored in alternatively the first database partition or the second database partition , the logical partition key being further configured for controlling the visibility of the first data and the second data to a user ;
and a data management policy configured for using the first database partition and the second database partition to archive the second data without removing the second data from the global data table .

US8190610B2
CLAIM 37
. The map-reduce method of claim 35 , wherein at least some of the reducers include a sort , group-by-key and combine task , the method further comprises generating and providing metadata (meta data) for at least some of the mapping , partitioning , combining , grouping and sorting .
US20060206507A1
CLAIM 1
. A hierarchal data management system for a storage device , comprising : an entity relationship discover to generate meta data (providing metadata) from a business object ;
a file manager to create a partition based on said metadata ;
and a data mover to generate a logical partitioning key and to store the logical partitioning key in said metadata for said partition , said file manager including a data management policy to define a data class and a storage policy to map said data class to said storage device to form a partition table .

US8190610B2
CLAIM 40
. A computer system including a plurality of computing devices , the computer system configured to process data of a data set , wherein the data set comprises a plurality of data groups , the computer system comprises at least one processor and memory that are operable to perform the following operations : for a first data (first data, time t) set having a plurality of first key-value pairs , wherein such first data set belongs to a first data group and the first key-value pairs have a first schema , partitioning the first data set into a plurality of data partitions (different partition) and providing each data partition to a selected one of a plurality of mapping functions that are each user-configurable to independently output a plurality of lists of values for each of a set of keys found in such map function' ;

s corresponding data partition to form a first intermediate data set having a first set of resulting key-value pairs ;

for a second data set having a plurality of second key-value pairs , wherein such second data set belongs to a second data group and the second key-value pairs have a second schema , partitioning the first data set into a plurality of data partitions and providing each data partition to a selected one of a plurality of mapping functions to form a second intermediate data set having a second set of resulting key-value pairs differing from the first set of resulting key-value pairs , wherein the first schema differs from the second schema ;

and reducing the first and the second intermediate data set together so as to form an output data set , wherein the reducing is accomplished by iterating on at least one key from the first and second intermediate data set , and the output data set has a different schema than the first and second schema , wherein the first and second schema and the first and second set of resulting key-value pairs have a key in common .
US20060206507A1
CLAIM 19
. The method of claim 11 , wherein the storage policy is configured to map data to different partition (data partitions) tables responsive to a date of the data .

US20060206507A1
CLAIM 20
. A system comprising : a first data (first data, first data group) base partition stored on a first storage device and configured to store data , the first data being within a first date range ;
a second database partition stored on a second storage device and configured to store second data , the second data being within a second date range , the first storage device having a faster physical access time t (first data, first data group) han the second storage device , the second date range being prior to the first date range ;
a global data table comprising the first database partition and the second database partition , the first database partition and the second database partition being transparent to a user ;
partition meta data including a logical partitioning key configured for determining if data should be stored in alternatively the first database partition or the second database partition , the logical partition key being further configured for controlling the visibility of the first data and the second data to a user ;
and a data management policy configured for using the first database partition and the second database partition to archive the second data without removing the second data from the global data table .

US8190610B2
CLAIM 44
. The computer system of claim 42 , wherein at least some of the reducers include a sort , group-by-key and combine task , the d the at least one processor and memory are further operable for : generating and providing metadata (meta data) for at least some of the mapping , partitioning , combining , grouping and sorting .
US20060206507A1
CLAIM 1
. A hierarchal data management system for a storage device , comprising : an entity relationship discover to generate meta data (providing metadata) from a business object ;
a file manager to create a partition based on said metadata ;
and a data mover to generate a logical partitioning key and to store the logical partitioning key in said metadata for said partition , said file manager including a data management policy to define a data class and a storage policy to map said data class to said storage device to form a partition table .




US8190610B2

Filed: 2006-10-05     Issued: 2012-05-29

MapReduce for distributed database processing

(Original Assignee) Yahoo Inc     (Current Assignee) R2 Solutions LLC ; Altaba Inc

Ali Dasdan, Hung-Chih Yang, Ruey-Lung Hsiao
WO2006060773A2

Filed: 2005-12-01     Issued: 2006-06-08

Computer systems and methods for visualizing data with generation of marks

(Original Assignee) Tableau Software Llc     

Patrick Hanrahan, Chris Stolte
US8190610B2
CLAIM 1
. A method of processing data of a data set over a distributed system , wherein the data set comprises a plurality of data groups , the method comprising : partitioning the data of each one of the data groups into a plurality of data partitions that each have a plurality of key-value pairs and providing each data partition to a selected one of a plurality of mapping functions that are each user-configurable to independently output a plurality of lists of values for each of a set of keys found in such map function' ;

s corresponding data partition to form corresponding intermediate data for that data group and identifiable to that data group , wherein the data of a first data group has a different schema (different fields, more fields) than the data of a second data (said time) group and the data of the first data group is mapped differently than the data of the second data group so that different lists of values are output for the corresponding different intermediate data , wherein the different schema and corresponding different intermediate data have a key in common ;

and reducing the intermediate data for the data groups to at least one output data group , including processing the intermediate data for each data group in a manner that is defined to correspond to that data group , so as to result in a merging of the corresponding different intermediate data based on the key in common , wherein the mapping and reducing operations are performed by a distributed system .
WO2006060773A2
CLAIM 11
. The method of claim 9 wherein said specification is expressed in a language based on a one or more fields (different schema, first schema) from the plurality of fields .

WO2006060773A2
CLAIM 15
. The method of claim 14 wherein each field in said plurality of fields has a plurality of levels , and wherein a first level from said plurality of levels is represented by a first component of said visual plot and wherein a second level from said plurality of levels is represented by a second component of said visual plot , wherein said first component and said second component are not the same as one another , and said first component and said second component may be from the same field or from different fields (different schema, first schema) .

WO2006060773A2
CLAIM 33
. The method of claim 32 wherein said time (second data, second data group) period is any one of : a year , a quarter , a month , a week , a day , an hour , a minute , or a second . 0

US8190610B2
CLAIM 12
. The method of claim 5 , wherein : the intermediate data includes a plurality of grouped sets of key/value pairs ;

the reducing step is carried out by a plurality of reducers ;

for at least one of the reducers , the iterator corresponding to a particular data group , for that reducer , operates according to a different key of a different schema (different fields, more fields) than the iterator corresponding to another particular data group , for that reducer .
WO2006060773A2
CLAIM 11
. The method of claim 9 wherein said specification is expressed in a language based on a one or more fields (different schema, first schema) from the plurality of fields .

WO2006060773A2
CLAIM 15
. The method of claim 14 wherein each field in said plurality of fields has a plurality of levels , and wherein a first level from said plurality of levels is represented by a first component of said visual plot and wherein a second level from said plurality of levels is represented by a second component of said visual plot , wherein said first component and said second component are not the same as one another , and said first component and said second component may be from the same field or from different fields (different schema, first schema) .

US8190610B2
CLAIM 17
. A computer system including a plurality of computing devices , the computer system configured to process data of a data set , wherein the data set comprises a plurality of data groups , the computer system comprises at least one processor and memory that are operable to perform the following operations : partitioning the data of each one of the data groups into a plurality of data partitions that each have a plurality of key-value pairs and providing each data partition to a selected one of a plurality of mapping functions that are each user-configurable to independently output a plurality of lists of values for each of a set of keys found in such map function' ;

s corresponding data partition to form corresponding intermediate data for that data group and identifiable to that data group , wherein the data of a first data group has a different schema (different fields, more fields) than the data of a second data (said time) group and the data of the first data group is mapped differently than the data of the second data group so that different lists of values are output for the corresponding different intermediate data , wherein the different schema and corresponding different intermediate data have a key in common ;

and reduce the intermediate data for the data groups to at least one output data group , including processing the intermediate data for each data group in a manner that is defined to correspond to that data group so as to result in a merging of the corresponding different intermediate data based on the key in common .
WO2006060773A2
CLAIM 11
. The method of claim 9 wherein said specification is expressed in a language based on a one or more fields (different schema, first schema) from the plurality of fields .

WO2006060773A2
CLAIM 15
. The method of claim 14 wherein each field in said plurality of fields has a plurality of levels , and wherein a first level from said plurality of levels is represented by a first component of said visual plot and wherein a second level from said plurality of levels is represented by a second component of said visual plot , wherein said first component and said second component are not the same as one another , and said first component and said second component may be from the same field or from different fields (different schema, first schema) .

WO2006060773A2
CLAIM 33
. The method of claim 32 wherein said time (second data, second data group) period is any one of : a year , a quarter , a month , a week , a day , an hour , a minute , or a second . 0

US8190610B2
CLAIM 22
. The computer system of claim 21 , wherein : the intermediate data includes a plurality of grouped sets of key/value pairs ;

the at least one processor and memory are further operable to reduce the intermediate data via a plurality of reducers ;

for at least one of the reducers , the iterator corresponding to a particular data group , for that reducer , is configured to operate according to a different key of a different schema (different fields, more fields) than the iterator corresponding to another particular data group , for that reducer .
WO2006060773A2
CLAIM 11
. The method of claim 9 wherein said specification is expressed in a language based on a one or more fields (different schema, first schema) from the plurality of fields .

WO2006060773A2
CLAIM 15
. The method of claim 14 wherein each field in said plurality of fields has a plurality of levels , and wherein a first level from said plurality of levels is represented by a first component of said visual plot and wherein a second level from said plurality of levels is represented by a second component of said visual plot , wherein said first component and said second component are not the same as one another , and said first component and said second component may be from the same field or from different fields (different schema, first schema) .

US8190610B2
CLAIM 33
. A map-reduce method of processing data from a plurality of groups having different schema (different fields, more fields) over a computer system , the method comprising : for a first data set having a plurality of first key-value pairs , wherein such first data set belongs to a first data group and the first key-value pairs have a first schema (different fields, more fields) , partitioning the first data set into a plurality of data partitions and providing each data partition to a selected one of a plurality of mapping functions that are each user-configurable to independently output a plurality of lists of values for each of a set of keys found in such map function' ;

s corresponding data partition to form a first intermediate data set having a first set of resulting key-value pairs ;

for a second data (said time) set having a plurality of second key-value pairs , wherein such second data set belongs to a second data group and the second key-value pairs have a second schema (structured data) , partitioning the first data set into a plurality of data partitions and providing each data partition to a selected one of a plurality of mapping functions to form a second intermediate data set having a second set (structured data) of resulting key-value pairs differing from the first set of resulting key-value pairs , wherein the first schema differs from the second schema , wherein the first and second schema and the first and second set of resulting key-value pairs have a key in common ;

and reducing the first and the second intermediate data set together so as to form an output data set , wherein the reducing is accomplished by iterating on at least one key from the first and second intermediate data set , and the output data set has a different schema than the first and second schema , wherein the mapping and reducing operations are performed by a distributed system .
WO2006060773A2
CLAIM 5
. The method of claim 2 wherein the database is an unstructured data (second schema, second set) base .

WO2006060773A2
CLAIM 11
. The method of claim 9 wherein said specification is expressed in a language based on a one or more fields (different schema, first schema) from the plurality of fields .

WO2006060773A2
CLAIM 15
. The method of claim 14 wherein each field in said plurality of fields has a plurality of levels , and wherein a first level from said plurality of levels is represented by a first component of said visual plot and wherein a second level from said plurality of levels is represented by a second component of said visual plot , wherein said first component and said second component are not the same as one another , and said first component and said second component may be from the same field or from different fields (different schema, first schema) .

WO2006060773A2
CLAIM 33
. The method of claim 32 wherein said time (second data, second data group) period is any one of : a year , a quarter , a month , a week , a day , an hour , a minute , or a second . 0

US8190610B2
CLAIM 40
. A computer system including a plurality of computing devices , the computer system configured to process data of a data set , wherein the data set comprises a plurality of data groups , the computer system comprises at least one processor and memory that are operable to perform the following operations : for a first data set having a plurality of first key-value pairs , wherein such first data set belongs to a first data group and the first key-value pairs have a first schema (different fields, more fields) , partitioning the first data set into a plurality of data partitions and providing each data partition to a selected one of a plurality of mapping functions that are each user-configurable to independently output a plurality of lists of values for each of a set of keys found in such map function' ;

s corresponding data partition to form a first intermediate data set having a first set of resulting key-value pairs ;

for a second data (said time) set having a plurality of second key-value pairs , wherein such second data set belongs to a second data group and the second key-value pairs have a second schema (structured data) , partitioning the first data set into a plurality of data partitions and providing each data partition to a selected one of a plurality of mapping functions to form a second intermediate data set having a second set (structured data) of resulting key-value pairs differing from the first set of resulting key-value pairs , wherein the first schema differs from the second schema ;

and reducing the first and the second intermediate data set together so as to form an output data set , wherein the reducing is accomplished by iterating on at least one key from the first and second intermediate data set , and the output data set has a different schema (different fields, more fields) than the first and second schema , wherein the first and second schema and the first and second set of resulting key-value pairs have a key in common .
WO2006060773A2
CLAIM 5
. The method of claim 2 wherein the database is an unstructured data (second schema, second set) base .

WO2006060773A2
CLAIM 11
. The method of claim 9 wherein said specification is expressed in a language based on a one or more fields (different schema, first schema) from the plurality of fields .

WO2006060773A2
CLAIM 15
. The method of claim 14 wherein each field in said plurality of fields has a plurality of levels , and wherein a first level from said plurality of levels is represented by a first component of said visual plot and wherein a second level from said plurality of levels is represented by a second component of said visual plot , wherein said first component and said second component are not the same as one another , and said first component and said second component may be from the same field or from different fields (different schema, first schema) .

WO2006060773A2
CLAIM 33
. The method of claim 32 wherein said time (second data, second data group) period is any one of : a year , a quarter , a month , a week , a day , an hour , a minute , or a second . 0




US8190610B2

Filed: 2006-10-05     Issued: 2012-05-29

MapReduce for distributed database processing

(Original Assignee) Yahoo Inc     (Current Assignee) R2 Solutions LLC ; Altaba Inc

Ali Dasdan, Hung-Chih Yang, Ruey-Lung Hsiao
CN1761203A

Filed: 2005-11-03     Issued: 2006-04-19

网上信息安全综合分析与监控系统

(Original Assignee) Shanghai Jiaotong University     (Current Assignee) Shanghai Jiaotong University

李生红, 李建华, 林祥, 周日升, 周黎
US8190610B2
CLAIM 1
. A method of processing data of a data set over a distributed system , wherein the data set comprises a plurality of data groups , the method comprising : partitioning the data of each one of the data groups into a plurality of data partitions that each have a plurality of key-value pairs and providing each data partition to a selected one of a plurality of mapping functions that are each user-configurable to independently output a plurality of lists of values for each of a set of keys found in such map function' ;

s corresponding data partition to form corresponding intermediate data (别模块) for that data group and identifiable to that data group , wherein the data of a first data group has a different schema than the data of a second data group and the data of the first data group is mapped differently than the data of the second data group so that different lists of values are output for the corresponding different intermediate data , wherein the different schema and corresponding different intermediate data have a key in common ;

and reducing the intermediate data for the data groups to at least one output data group , including processing the intermediate data for each data group in a manner that is defined to correspond to that data group , so as to result in a merging of the corresponding different intermediate data based on the key in common , wherein the mapping and reducing operations are performed by a distributed system .
CN1761203A
CLAIM 1
. 一种网上信息安全综合分析与监控系统,包括:信息数据流处理结点、角色访问控制模块、协议分析及信息捕获模块、黑/白名单数据库、数据类型识别模块 (s corresponding data partition to form corresponding intermediate data) 、文本被动分析模块、图像被动分析模块、处理中心,其特征在于,还包括:图像隐藏信息检测模块、基于分级技术的文本图像主动分析模块、事后数据分析处理模块,待处理的信息首先通过信息数据流处理结点,信息数据流处理结点将其转发至角色访问控制模块,角色访问控制模块检查识别信息是否来自于受控信息源并将其送交协议分析及信息捕获模块,协议分析及信息捕获模块从信息中提取出信息主体内容并将其转发给黑/白名单数据库,黑/白名单数据库对信息流进行过滤后将信息分两路送入数据类型识别模块与图像隐藏信息检测模块,数据类型识别模块检查数据类型,若信息包含分级标签则将信息送入基于分级技术的文本图像主动分析模块,若信息为文本信息,则送入文本被动分析模块,若信息为图像信息,则送入图像被动分析模块,图像隐藏信息检测模块监测图像是否包含隐藏信息,并将该信息按类型送交基于分级技术的文本图像主动分析模块、文本被动分析模块、图像被动分析模块,文本图像主动分析模块、文本被动分析模块、图像被动分析模块分别对信息进行分析,并将分析结果汇总给处理中心,处理中心根据汇总结果,通知信息数据流处理结点拦截或是放行该信息流,并将分析结果通知事后数据分析处理模块,事后数据分析处理模块按照分析结果对黑/白名单数据库进行更新。

US8190610B2
CLAIM 8
. The method of claim 7 , wherein : in the partitioning step (进一步) , all of the key/value pairs of the intermediate data are provided to all of the partitions .
CN1761203A
CLAIM 5
. 根据权利要求1所述的网上信息安全综合分析与监控系统,其特征是,所述的协议分析及信息捕获模块包括:协议分析模块与信息捕获模块,协议分析模块负责对各种协议下所传播的数据进行分析,并将分析结果送入信息捕获模块,信息捕获模块根据协议的类别进行各种协议下的数据包的组合并提取出需进一步 (partitioning step) 分析的数据。

US8190610B2
CLAIM 17
. A computer system including a plurality of computing devices , the computer system configured to process data of a data set , wherein the data set comprises a plurality of data groups , the computer system comprises at least one processor and memory that are operable to perform the following operations : partitioning the data of each one of the data groups into a plurality of data partitions that each have a plurality of key-value pairs and providing each data partition to a selected one of a plurality of mapping functions that are each user-configurable to independently output a plurality of lists of values for each of a set of keys found in such map function' ;

s corresponding data partition to form corresponding intermediate data (别模块) for that data group and identifiable to that data group , wherein the data of a first data group has a different schema than the data of a second data group and the data of the first data group is mapped differently than the data of the second data group so that different lists of values are output for the corresponding different intermediate data , wherein the different schema and corresponding different intermediate data have a key in common ;

and reduce the intermediate data for the data groups to at least one output data group , including processing the intermediate data for each data group in a manner that is defined to correspond to that data group so as to result in a merging of the corresponding different intermediate data based on the key in common .
CN1761203A
CLAIM 1
. 一种网上信息安全综合分析与监控系统,包括:信息数据流处理结点、角色访问控制模块、协议分析及信息捕获模块、黑/白名单数据库、数据类型识别模块 (s corresponding data partition to form corresponding intermediate data) 、文本被动分析模块、图像被动分析模块、处理中心,其特征在于,还包括:图像隐藏信息检测模块、基于分级技术的文本图像主动分析模块、事后数据分析处理模块,待处理的信息首先通过信息数据流处理结点,信息数据流处理结点将其转发至角色访问控制模块,角色访问控制模块检查识别信息是否来自于受控信息源并将其送交协议分析及信息捕获模块,协议分析及信息捕获模块从信息中提取出信息主体内容并将其转发给黑/白名单数据库,黑/白名单数据库对信息流进行过滤后将信息分两路送入数据类型识别模块与图像隐藏信息检测模块,数据类型识别模块检查数据类型,若信息包含分级标签则将信息送入基于分级技术的文本图像主动分析模块,若信息为文本信息,则送入文本被动分析模块,若信息为图像信息,则送入图像被动分析模块,图像隐藏信息检测模块监测图像是否包含隐藏信息,并将该信息按类型送交基于分级技术的文本图像主动分析模块、文本被动分析模块、图像被动分析模块,文本图像主动分析模块、文本被动分析模块、图像被动分析模块分别对信息进行分析,并将分析结果汇总给处理中心,处理中心根据汇总结果,通知信息数据流处理结点拦截或是放行该信息流,并将分析结果通知事后数据分析处理模块,事后数据分析处理模块按照分析结果对黑/白名单数据库进行更新。




US8190610B2

Filed: 2006-10-05     Issued: 2012-05-29

MapReduce for distributed database processing

(Original Assignee) Yahoo Inc     (Current Assignee) R2 Solutions LLC ; Altaba Inc

Ali Dasdan, Hung-Chih Yang, Ruey-Lung Hsiao
WO2006050349A2

Filed: 2005-10-28     Issued: 2006-05-11

Methods and apparatus for running applications on computer grids

(Original Assignee) Hewlett-Packard Development Company, L.P.     

Fabricio Alves Barbosa Da Silva, Silvia Regina De Carvalho
US8190610B2
CLAIM 1
. A method of processing data of a data set (input file) over a distributed system , wherein the data set comprises a plurality of data groups , the method comprising : partitioning the data of each one of the data groups into a plurality of data partitions that each have a plurality of key-value pairs and providing each data partition to a selected one of a plurality of mapping functions that are each user-configurable to independently output a plurality of lists of values for each of a set of keys found in such map function' ;

s corresponding data partition to form corresponding intermediate data for that data group and identifiable to that data group , wherein the data of a first data group has a different schema than the data of a second data group and the data of the first data group is mapped differently than the data of the second data group so that different lists (ordered list) of values are output for the corresponding different intermediate data , wherein the different schema and corresponding different intermediate data have a key in common ;

and reducing the intermediate data for the data groups to at least one output data group , including processing the intermediate data for each data group in a manner that is defined to correspond to that data group , so as to result in a merging of the corresponding different intermediate data based on the key in common , wherein the mapping and reducing operations are performed by a distributed system .
WO2006050349A2
CLAIM 6
. A method as claimed in any of claims 2 to 4 wherein the task queue corresponds to a size ordered list (different lists) of the tasks constituting the grid application .

WO2006050349A2
CLAIM 11
. A method as claimed in any one of claims 2 to 10 wherein the task queue is ordered taking into account input file (first set, data set, first data set, output data set, intermediate data set) s which are shared between tasks or have a degree of association .

US8190610B2
CLAIM 14
. The method of claim 13 , wherein : the reducing step is carried out by a plurality of reducers ;

and the data that is not intermediate data , for a particular reducer , includes data (computing unit) that is associated with another reducer .
WO2006050349A2
CLAIM 13
. A method as claimed in any of claims 1 to 12 wherein the tasks are grouped according to a method of scheduling tasks among a plurality of computing unit (includes data) s , the method including the following steps : I) define the number of tasks to be assigned in groups to the computing units , where P is the number of computing units ;
IF) compute the size of each task ;
III) rank the task files in a list L in order of increasing size , IV) for each group , beginning with the group with the largest number of tasks perform the following steps (a) to (e) : (i) assign the smallest unassigned task file to the group ;
- (j) set the task file list position index equal to 1 ;
(k) while the group is not completely populated by task files perform the following steps : (i) if the position index plus P is less than or equal than the size of the list L , and the task file affinity between the task file at the position index and the task file at the position index +1 is less than a specified value , k then increment the position index by P ;
otherwise increment position index by 1 ;
(ii) assign to the group , the task file located at position in list L (1) Remove assigned task files from List L (m) Increment P = P - I

US8190610B2
CLAIM 15
. The method of claim 13 , wherein : the reducing step is carried out by a plurality of reducers ;

and the data that is not intermediate data , for a particular reducer , includes data (computing unit) that is associated with that reducer .
WO2006050349A2
CLAIM 13
. A method as claimed in any of claims 1 to 12 wherein the tasks are grouped according to a method of scheduling tasks among a plurality of computing unit (includes data) s , the method including the following steps : I) define the number of tasks to be assigned in groups to the computing units , where P is the number of computing units ;
IF) compute the size of each task ;
III) rank the task files in a list L in order of increasing size , IV) for each group , beginning with the group with the largest number of tasks perform the following steps (a) to (e) : (i) assign the smallest unassigned task file to the group ;
- (j) set the task file list position index equal to 1 ;
(k) while the group is not completely populated by task files perform the following steps : (i) if the position index plus P is less than or equal than the size of the list L , and the task file affinity between the task file at the position index and the task file at the position index +1 is less than a specified value , k then increment the position index by P ;
otherwise increment position index by 1 ;
(ii) assign to the group , the task file located at position in list L (1) Remove assigned task files from List L (m) Increment P = P - I

US8190610B2
CLAIM 17
. A computer system including a plurality of computing devices , the computer system configured to process data of a data set (input file) , wherein the data set comprises a plurality of data groups , the computer system comprises at least one processor and memory that are operable to perform the following operations : partitioning the data of each one of the data groups into a plurality of data partitions that each have a plurality of key-value pairs and providing each data partition to a selected one of a plurality of mapping functions that are each user-configurable to independently output a plurality of lists of values for each of a set of keys found in such map function' ;

s corresponding data partition to form corresponding intermediate data for that data group and identifiable to that data group , wherein the data of a first data group has a different schema than the data of a second data group and the data of the first data group is mapped differently than the data of the second data group so that different lists (ordered list) of values are output for the corresponding different intermediate data , wherein the different schema and corresponding different intermediate data have a key in common ;

and reduce the intermediate data for the data groups to at least one output data group , including processing the intermediate data for each data group in a manner that is defined to correspond to that data group so as to result in a merging of the corresponding different intermediate data based on the key in common .
WO2006050349A2
CLAIM 6
. A method as claimed in any of claims 2 to 4 wherein the task queue corresponds to a size ordered list (different lists) of the tasks constituting the grid application .

WO2006050349A2
CLAIM 11
. A method as claimed in any one of claims 2 to 10 wherein the task queue is ordered taking into account input file (first set, data set, first data set, output data set, intermediate data set) s which are shared between tasks or have a degree of association .

US8190610B2
CLAIM 30
. The computer system of claim 29 , wherein : the at least one processor and memory are further operable to reduce the intermediate data via a plurality of reducers ;

and the data that is not intermediate data , for a particular reducer , includes data (computing unit) that is associated with another reducer .
WO2006050349A2
CLAIM 13
. A method as claimed in any of claims 1 to 12 wherein the tasks are grouped according to a method of scheduling tasks among a plurality of computing unit (includes data) s , the method including the following steps : I) define the number of tasks to be assigned in groups to the computing units , where P is the number of computing units ;
IF) compute the size of each task ;
III) rank the task files in a list L in order of increasing size , IV) for each group , beginning with the group with the largest number of tasks perform the following steps (a) to (e) : (i) assign the smallest unassigned task file to the group ;
- (j) set the task file list position index equal to 1 ;
(k) while the group is not completely populated by task files perform the following steps : (i) if the position index plus P is less than or equal than the size of the list L , and the task file affinity between the task file at the position index and the task file at the position index +1 is less than a specified value , k then increment the position index by P ;
otherwise increment position index by 1 ;
(ii) assign to the group , the task file located at position in list L (1) Remove assigned task files from List L (m) Increment P = P - I

US8190610B2
CLAIM 31
. The computer system of claim 29 , wherein : the at least one processor and memory are further operable to reduce the intermediate data via a plurality of reducers ;

and the data that is not intermediate data , for a particular reducer , includes data (computing unit) that is associated with that reducer .
WO2006050349A2
CLAIM 13
. A method as claimed in any of claims 1 to 12 wherein the tasks are grouped according to a method of scheduling tasks among a plurality of computing unit (includes data) s , the method including the following steps : I) define the number of tasks to be assigned in groups to the computing units , where P is the number of computing units ;
IF) compute the size of each task ;
III) rank the task files in a list L in order of increasing size , IV) for each group , beginning with the group with the largest number of tasks perform the following steps (a) to (e) : (i) assign the smallest unassigned task file to the group ;
- (j) set the task file list position index equal to 1 ;
(k) while the group is not completely populated by task files perform the following steps : (i) if the position index plus P is less than or equal than the size of the list L , and the task file affinity between the task file at the position index and the task file at the position index +1 is less than a specified value , k then increment the position index by P ;
otherwise increment position index by 1 ;
(ii) assign to the group , the task file located at position in list L (1) Remove assigned task files from List L (m) Increment P = P - I

US8190610B2
CLAIM 33
. A map-reduce method of processing data from a plurality of groups having different schema over a computer system , the method comprising : for a first data set (input file) having a plurality of first key-value pairs , wherein such first data set belongs to a first data group and the first key-value pairs have a first schema , partitioning the first data set into a plurality of data partitions and providing each data partition to a selected one of a plurality of mapping functions that are each user-configurable to independently output a plurality of lists of values for each of a set of keys found in such map function' ;

s corresponding data partition to form a first intermediate data set having a first set (input file) of resulting key-value pairs ;

for a second data set having a plurality of second key-value pairs , wherein such second data set belongs to a second data group and the second key-value pairs have a second schema , partitioning the first data set into a plurality of data partitions and providing each data partition to a selected one of a plurality of mapping functions to form a second intermediate data set having a second set (following steps) of resulting key-value pairs differing from the first set of resulting key-value pairs , wherein the first schema differs from the second schema , wherein the first and second schema and the first and second set of resulting key-value pairs have a key in common ;

and reducing the first and the second intermediate data set together so as to form an output data set , wherein the reducing is accomplished by iterating on at least one key from the first and second intermediate data set , and the output data set has a different schema than the first and second schema , wherein the mapping and reducing operations are performed by a distributed system .
WO2006050349A2
CLAIM 3
. A method of running an application on a computational grid comprising a plurality of computational units , the application comprised of a plurality of tasks , the method including the steps of : A) grouping the tasks according to the total number of computational units and total number of tasks based on an initial determination or assumption in respect of the relative processing power of the computational units constituting the computational grid ;
B) scheduling groups of tasks on computational units of the computational grid using a task queue ;
C) while there remain uncompleted tasks perform step D) D) when a computational unit P 1 completes the execution of at least one task , perform the following steps (second set, reduce method) (a) to (d) : (a) compute the mean execution time for the completed task on computational unit P ;
;
(b) update the task queue ;
(c) abort any still running replicas of the completed tasks ;
(d) if computational unit Pj is idle perform the following steps (i) if there are unfinished tasks on slower computational units then replicate the unfinished tasks on computational unit P ;
;
E) end

WO2006050349A2
CLAIM 11
. A method as claimed in any one of claims 2 to 10 wherein the task queue is ordered taking into account input file (first set, data set, first data set, output data set, intermediate data set) s which are shared between tasks or have a degree of association .

US8190610B2
CLAIM 34
. The map-reduce method of claim 33 , wherein the reducing is accomplished by iterating on at least one key from each of the first and second intermediate data set (input file) so that the output data set is a merging of a portion of the first and second intermediate data set .
WO2006050349A2
CLAIM 11
. A method as claimed in any one of claims 2 to 10 wherein the task queue is ordered taking into account input file (first set, data set, first data set, output data set, intermediate data set) s which are shared between tasks or have a degree of association .

US8190610B2
CLAIM 35
. The map-reduce method of claim 33 , wherein the reducing is accomplished by a plurality of reducers , the method further comprises : partitioning the first or second intermediate data set (input file) into a plurality of partitions , at least some of the key-value pairs of the first or second intermediate data being provided to more than one of the partitions ;

and providing the partitioned intermediate data set of each partition to a separate one of the reducers .
WO2006050349A2
CLAIM 11
. A method as claimed in any one of claims 2 to 10 wherein the task queue is ordered taking into account input file (first set, data set, first data set, output data set, intermediate data set) s which are shared between tasks or have a degree of association .

US8190610B2
CLAIM 36
. The map-reduce method of claim 35 , wherein all of the key-value pairs of the first and second intermediate data set (input file) are provided to all of the reducers .
WO2006050349A2
CLAIM 11
. A method as claimed in any one of claims 2 to 10 wherein the task queue is ordered taking into account input file (first set, data set, first data set, output data set, intermediate data set) s which are shared between tasks or have a degree of association .

US8190610B2
CLAIM 40
. A computer system including a plurality of computing devices , the computer system configured to process data of a data set (input file) , wherein the data set comprises a plurality of data groups , the computer system comprises at least one processor and memory that are operable to perform the following operations : for a first data set having a plurality of first key-value pairs , wherein such first data set belongs to a first data group and the first key-value pairs have a first schema , partitioning the first data set into a plurality of data partitions and providing each data partition to a selected one of a plurality of mapping functions that are each user-configurable to independently output a plurality of lists of values for each of a set of keys found in such map function' ;

s corresponding data partition to form a first intermediate data set having a first set (input file) of resulting key-value pairs ;

for a second data set having a plurality of second key-value pairs , wherein such second data set belongs to a second data group and the second key-value pairs have a second schema , partitioning the first data set into a plurality of data partitions and providing each data partition to a selected one of a plurality of mapping functions to form a second intermediate data set having a second set (following steps) of resulting key-value pairs differing from the first set of resulting key-value pairs , wherein the first schema differs from the second schema ;

and reducing the first and the second intermediate data set together so as to form an output data set , wherein the reducing is accomplished by iterating on at least one key from the first and second intermediate data set , and the output data set has a different schema than the first and second schema , wherein the first and second schema and the first and second set of resulting key-value pairs have a key in common .
WO2006050349A2
CLAIM 3
. A method of running an application on a computational grid comprising a plurality of computational units , the application comprised of a plurality of tasks , the method including the steps of : A) grouping the tasks according to the total number of computational units and total number of tasks based on an initial determination or assumption in respect of the relative processing power of the computational units constituting the computational grid ;
B) scheduling groups of tasks on computational units of the computational grid using a task queue ;
C) while there remain uncompleted tasks perform step D) D) when a computational unit P 1 completes the execution of at least one task , perform the following steps (second set, reduce method) (a) to (d) : (a) compute the mean execution time for the completed task on computational unit P ;
;
(b) update the task queue ;
(c) abort any still running replicas of the completed tasks ;
(d) if computational unit Pj is idle perform the following steps (i) if there are unfinished tasks on slower computational units then replicate the unfinished tasks on computational unit P ;
;
E) end

WO2006050349A2
CLAIM 11
. A method as claimed in any one of claims 2 to 10 wherein the task queue is ordered taking into account input file (first set, data set, first data set, output data set, intermediate data set) s which are shared between tasks or have a degree of association .

US8190610B2
CLAIM 41
. The computer system of claim 40 , wherein the reducing is accomplished by iterating on at least one key from each of the first and second intermediate data set (input file) so that the output data set is a merging of a portion of the first and second intermediate data set .
WO2006050349A2
CLAIM 11
. A method as claimed in any one of claims 2 to 10 wherein the task queue is ordered taking into account input file (first set, data set, first data set, output data set, intermediate data set) s which are shared between tasks or have a degree of association .

US8190610B2
CLAIM 42
. The computer system of claim 40 , wherein the reducing is accomplished by a plurality of reducers , the d the at least one processor and memory further operable for : partitioning the first or second intermediate data set (input file) into a plurality of partitions , at least some of the key-value pairs of the first or second intermediate data being provided to more than one of the partitions ;

and providing the partitioned intermediate data set of each partition to a separate one of the reducers .
WO2006050349A2
CLAIM 11
. A method as claimed in any one of claims 2 to 10 wherein the task queue is ordered taking into account input file (first set, data set, first data set, output data set, intermediate data set) s which are shared between tasks or have a degree of association .

US8190610B2
CLAIM 43
. The computer system of claim 42 , wherein all of the key-value pairs of the first and second intermediate data set (input file) are provided to all of the reducers .
WO2006050349A2
CLAIM 11
. A method as claimed in any one of claims 2 to 10 wherein the task queue is ordered taking into account input file (first set, data set, first data set, output data set, intermediate data set) s which are shared between tasks or have a degree of association .




US8190610B2

Filed: 2006-10-05     Issued: 2012-05-29

MapReduce for distributed database processing

(Original Assignee) Yahoo Inc     (Current Assignee) R2 Solutions LLC ; Altaba Inc

Ali Dasdan, Hung-Chih Yang, Ruey-Lung Hsiao
US20050273730A1

Filed: 2005-07-20     Issued: 2005-12-08

System and method for browsing hierarchically based node-link structures based on an estimated degree of interest

(Original Assignee) Card Stuart K; Nation David A     

Stuart Card, David Nation
US8190610B2
CLAIM 1
. A method of processing data of a data set (second set, more sets, first set) over a distributed system , wherein the data set comprises a plurality of data groups , the method comprising : partitioning the data of each one of the data groups into a plurality of data partitions that each have a plurality of key-value pairs and providing each data partition to a selected one of a plurality of mapping functions that are each user-configurable to independently output a plurality of lists of values for each of a set of keys found in such map function' ;

s corresponding data partition to form corresponding intermediate data for that data group and identifiable to that data group , wherein the data of a first data (repeating step, said sub) group has a different schema than the data of a second data group (repeating step, said sub) and the data of the first data group is mapped differently than the data of the second data group so that different lists of values are output for the corresponding different intermediate data , wherein the different schema and corresponding different intermediate data have a key in common ;

and reducing the intermediate data for the data groups to at least one output data group , including processing the intermediate data for each data group in a manner that is defined to correspond to that data group , so as to result in a merging of the corresponding different intermediate data based on the key in common , wherein the mapping and reducing operations are performed by a distributed system .
US20050273730A1
CLAIM 2
. The method as recited in claim 1 further comprising the step of : f) detecting that a user has selected a second focus node ;
g) generating a second degree of interest value for each of said plurality of nodes relative to said second focus node ;
h) repeating step (first data, first data group, second data group) s c)-e) using said second degree of interest value for each of said plurality of nodes .

US20050273730A1
CLAIM 6
. The method as recited in claim 1 wherein said step of identifying and performing any node compression necessary for displaying said linked information based on the layout of said plurality of nodes is further comprised of the step of : d1) determining from said layout that said nodes will not fit vertically into said display area ;
d2) identifying subtrees in said layout causing said layout not to fit in said display area ;
d3) causing said sub (first data, first data group, second data group) trees to be displayed in a manner proportionate to the size of the subtree .

US20050273730A1
CLAIM 8
. The method as recited in claim 1 wherein said step of displaying said linked information is further comprised of the step of displaying a first set (second set, first set, data set, first data set, second data set) of data items associated with said nodes .

US20050273730A1
CLAIM 9
. The method as recited in claim 8 further comprising the step of : detecting that a user has requested that a second set (second set, first set, data set, first data set, second data set) of data items associated with said nodes be displayed ;
and displaying said second set of data items associated with said nodes .

US20050273730A1
CLAIM 19
. The method of claim 1 , further comprising the steps of : determining one or more sets (second set, first set, data set, first data set, second data set) of nodes from the plurality of linked nodes , the nodes in each set associated with a plurality of faces ;
associating groups of related display items to successive faces of the one or more sets of nodes from the plurality of linked nodes ;
and rotating all of the nodes in each set simultaneously based on a user input .

US8190610B2
CLAIM 12
. The method of claim 5 , wherein : the intermediate data includes a plurality of grouped sets of key/value pairs ;

the reducing step is carried out by a plurality of reducers ;

for at least one of the reducers , the iterator corresponding to a particular data (linked data) group , for that reducer , operates according to a different key of a different schema than the iterator corresponding to another particular data group , for that reducer .
US20050273730A1
CLAIM 11
. System for browsing a collection of hierarchically linked data (particular data) comprising : display means having a display area for presenting views of a visualization of said collection of hierarchically linked data ;
input device for providing input to change view of said visualization of said collection of linked data based on dynamically selected linked data ;
and visualization processing element coupled to said display means and said input device , said visualization for creating a bounded tree structure visualization of said collection of hierarchically linked data based on a Degree of Interest relative to said focus node and sibling order distance from the focus node and a size of said display area .

US8190610B2
CLAIM 17
. A computer system including a plurality of computing devices , the computer system configured to process data of a data set (second set, more sets, first set) , wherein the data set comprises a plurality of data groups , the computer system comprises at least one processor and memory that are operable to perform the following operations : partitioning the data of each one of the data groups into a plurality of data partitions that each have a plurality of key-value pairs and providing each data partition to a selected one of a plurality of mapping functions that are each user-configurable to independently output a plurality of lists of values for each of a set of keys found in such map function' ;

s corresponding data partition to form corresponding intermediate data for that data group and identifiable to that data group , wherein the data of a first data (repeating step, said sub) group has a different schema than the data of a second data group (repeating step, said sub) and the data of the first data group is mapped differently than the data of the second data group so that different lists of values are output for the corresponding different intermediate data , wherein the different schema and corresponding different intermediate data have a key in common ;

and reduce the intermediate data for the data groups to at least one output data group , including processing the intermediate data for each data group in a manner that is defined to correspond to that data group so as to result in a merging of the corresponding different intermediate data based on the key in common .
US20050273730A1
CLAIM 2
. The method as recited in claim 1 further comprising the step of : f) detecting that a user has selected a second focus node ;
g) generating a second degree of interest value for each of said plurality of nodes relative to said second focus node ;
h) repeating step (first data, first data group, second data group) s c)-e) using said second degree of interest value for each of said plurality of nodes .

US20050273730A1
CLAIM 6
. The method as recited in claim 1 wherein said step of identifying and performing any node compression necessary for displaying said linked information based on the layout of said plurality of nodes is further comprised of the step of : d1) determining from said layout that said nodes will not fit vertically into said display area ;
d2) identifying subtrees in said layout causing said layout not to fit in said display area ;
d3) causing said sub (first data, first data group, second data group) trees to be displayed in a manner proportionate to the size of the subtree .

US20050273730A1
CLAIM 8
. The method as recited in claim 1 wherein said step of displaying said linked information is further comprised of the step of displaying a first set (second set, first set, data set, first data set, second data set) of data items associated with said nodes .

US20050273730A1
CLAIM 9
. The method as recited in claim 8 further comprising the step of : detecting that a user has requested that a second set (second set, first set, data set, first data set, second data set) of data items associated with said nodes be displayed ;
and displaying said second set of data items associated with said nodes .

US20050273730A1
CLAIM 19
. The method of claim 1 , further comprising the steps of : determining one or more sets (second set, first set, data set, first data set, second data set) of nodes from the plurality of linked nodes , the nodes in each set associated with a plurality of faces ;
associating groups of related display items to successive faces of the one or more sets of nodes from the plurality of linked nodes ;
and rotating all of the nodes in each set simultaneously based on a user input .

US8190610B2
CLAIM 22
. The computer system of claim 21 , wherein : the intermediate data includes a plurality of grouped sets of key/value pairs ;

the at least one processor and memory are further operable to reduce the intermediate data via a plurality of reducers ;

for at least one of the reducers , the iterator corresponding to a particular data (linked data) group , for that reducer , is configured to operate according to a different key of a different schema than the iterator corresponding to another particular data group , for that reducer .
US20050273730A1
CLAIM 11
. System for browsing a collection of hierarchically linked data (particular data) comprising : display means having a display area for presenting views of a visualization of said collection of hierarchically linked data ;
input device for providing input to change view of said visualization of said collection of linked data based on dynamically selected linked data ;
and visualization processing element coupled to said display means and said input device , said visualization for creating a bounded tree structure visualization of said collection of hierarchically linked data based on a Degree of Interest relative to said focus node and sibling order distance from the focus node and a size of said display area .

US8190610B2
CLAIM 33
. A map-reduce method of processing data from a plurality of groups having different schema over a computer system , the method comprising : for a first data (repeating step, said sub) set having a plurality of first key-value pairs , wherein such first data set belongs to a first data group and the first key-value pairs have a first schema , partitioning the first data set into a plurality of data partitions and providing each data partition to a selected one of a plurality of mapping functions that are each user-configurable to independently output a plurality of lists of values for each of a set of keys found in such map function' ;

s corresponding data partition to form a first intermediate data set (second set, more sets, first set) having a first set (second set, more sets, first set) of resulting key-value pairs ;

for a second data set having a plurality of second key-value pairs , wherein such second data set belongs to a second data group (repeating step, said sub) and the second key-value pairs have a second schema , partitioning the first data set into a plurality of data partitions and providing each data partition to a selected one of a plurality of mapping functions to form a second intermediate data set having a second set (second set, more sets, first set) of resulting key-value pairs differing from the first set of resulting key-value pairs , wherein the first schema differs from the second schema , wherein the first and second schema and the first and second set of resulting key-value pairs have a key in common ;

and reducing the first and the second intermediate data set together so as to form an output data set , wherein the reducing is accomplished by iterating on at least one key from the first and second intermediate data set , and the output data set has a different schema than the first and second schema , wherein the mapping and reducing operations are performed by a distributed system .
US20050273730A1
CLAIM 2
. The method as recited in claim 1 further comprising the step of : f) detecting that a user has selected a second focus node ;
g) generating a second degree of interest value for each of said plurality of nodes relative to said second focus node ;
h) repeating step (first data, first data group, second data group) s c)-e) using said second degree of interest value for each of said plurality of nodes .

US20050273730A1
CLAIM 6
. The method as recited in claim 1 wherein said step of identifying and performing any node compression necessary for displaying said linked information based on the layout of said plurality of nodes is further comprised of the step of : d1) determining from said layout that said nodes will not fit vertically into said display area ;
d2) identifying subtrees in said layout causing said layout not to fit in said display area ;
d3) causing said sub (first data, first data group, second data group) trees to be displayed in a manner proportionate to the size of the subtree .

US20050273730A1
CLAIM 8
. The method as recited in claim 1 wherein said step of displaying said linked information is further comprised of the step of displaying a first set (second set, first set, data set, first data set, second data set) of data items associated with said nodes .

US20050273730A1
CLAIM 9
. The method as recited in claim 8 further comprising the step of : detecting that a user has requested that a second set (second set, first set, data set, first data set, second data set) of data items associated with said nodes be displayed ;
and displaying said second set of data items associated with said nodes .

US20050273730A1
CLAIM 19
. The method of claim 1 , further comprising the steps of : determining one or more sets (second set, first set, data set, first data set, second data set) of nodes from the plurality of linked nodes , the nodes in each set associated with a plurality of faces ;
associating groups of related display items to successive faces of the one or more sets of nodes from the plurality of linked nodes ;
and rotating all of the nodes in each set simultaneously based on a user input .

US8190610B2
CLAIM 34
. The map-reduce method of claim 33 , wherein the reducing is accomplished by iterating on at least one key from each of the first and second intermediate data set (second set, more sets, first set) so that the output data set is a merging of a portion of the first and second intermediate data set .
US20050273730A1
CLAIM 8
. The method as recited in claim 1 wherein said step of displaying said linked information is further comprised of the step of displaying a first set (second set, first set, data set, first data set, second data set) of data items associated with said nodes .

US20050273730A1
CLAIM 9
. The method as recited in claim 8 further comprising the step of : detecting that a user has requested that a second set (second set, first set, data set, first data set, second data set) of data items associated with said nodes be displayed ;
and displaying said second set of data items associated with said nodes .

US20050273730A1
CLAIM 19
. The method of claim 1 , further comprising the steps of : determining one or more sets (second set, first set, data set, first data set, second data set) of nodes from the plurality of linked nodes , the nodes in each set associated with a plurality of faces ;
associating groups of related display items to successive faces of the one or more sets of nodes from the plurality of linked nodes ;
and rotating all of the nodes in each set simultaneously based on a user input .

US8190610B2
CLAIM 35
. The map-reduce method of claim 33 , wherein the reducing is accomplished by a plurality of reducers , the method further comprises : partitioning the first or second intermediate data set (second set, more sets, first set) into a plurality of partitions , at least some of the key-value pairs of the first or second intermediate data being provided to more than one of the partitions ;

and providing the partitioned intermediate data set of each partition to a separate one of the reducers .
US20050273730A1
CLAIM 8
. The method as recited in claim 1 wherein said step of displaying said linked information is further comprised of the step of displaying a first set (second set, first set, data set, first data set, second data set) of data items associated with said nodes .

US20050273730A1
CLAIM 9
. The method as recited in claim 8 further comprising the step of : detecting that a user has requested that a second set (second set, first set, data set, first data set, second data set) of data items associated with said nodes be displayed ;
and displaying said second set of data items associated with said nodes .

US20050273730A1
CLAIM 19
. The method of claim 1 , further comprising the steps of : determining one or more sets (second set, first set, data set, first data set, second data set) of nodes from the plurality of linked nodes , the nodes in each set associated with a plurality of faces ;
associating groups of related display items to successive faces of the one or more sets of nodes from the plurality of linked nodes ;
and rotating all of the nodes in each set simultaneously based on a user input .

US8190610B2
CLAIM 36
. The map-reduce method of claim 35 , wherein all of the key-value pairs of the first and second intermediate data set (second set, more sets, first set) are provided to all of the reducers .
US20050273730A1
CLAIM 8
. The method as recited in claim 1 wherein said step of displaying said linked information is further comprised of the step of displaying a first set (second set, first set, data set, first data set, second data set) of data items associated with said nodes .

US20050273730A1
CLAIM 9
. The method as recited in claim 8 further comprising the step of : detecting that a user has requested that a second set (second set, first set, data set, first data set, second data set) of data items associated with said nodes be displayed ;
and displaying said second set of data items associated with said nodes .

US20050273730A1
CLAIM 19
. The method of claim 1 , further comprising the steps of : determining one or more sets (second set, first set, data set, first data set, second data set) of nodes from the plurality of linked nodes , the nodes in each set associated with a plurality of faces ;
associating groups of related display items to successive faces of the one or more sets of nodes from the plurality of linked nodes ;
and rotating all of the nodes in each set simultaneously based on a user input .

US8190610B2
CLAIM 40
. A computer system including a plurality of computing devices , the computer system configured to process data of a data set (second set, more sets, first set) , wherein the data set comprises a plurality of data groups , the computer system comprises at least one processor and memory that are operable to perform the following operations : for a first data (repeating step, said sub) set having a plurality of first key-value pairs , wherein such first data set belongs to a first data group and the first key-value pairs have a first schema , partitioning the first data set into a plurality of data partitions and providing each data partition to a selected one of a plurality of mapping functions that are each user-configurable to independently output a plurality of lists of values for each of a set of keys found in such map function' ;

s corresponding data partition to form a first intermediate data set having a first set (second set, more sets, first set) of resulting key-value pairs ;

for a second data set having a plurality of second key-value pairs , wherein such second data set belongs to a second data group (repeating step, said sub) and the second key-value pairs have a second schema , partitioning the first data set into a plurality of data partitions and providing each data partition to a selected one of a plurality of mapping functions to form a second intermediate data set having a second set (second set, more sets, first set) of resulting key-value pairs differing from the first set of resulting key-value pairs , wherein the first schema differs from the second schema ;

and reducing the first and the second intermediate data set together so as to form an output data set , wherein the reducing is accomplished by iterating on at least one key from the first and second intermediate data set , and the output data set has a different schema than the first and second schema , wherein the first and second schema and the first and second set of resulting key-value pairs have a key in common .
US20050273730A1
CLAIM 2
. The method as recited in claim 1 further comprising the step of : f) detecting that a user has selected a second focus node ;
g) generating a second degree of interest value for each of said plurality of nodes relative to said second focus node ;
h) repeating step (first data, first data group, second data group) s c)-e) using said second degree of interest value for each of said plurality of nodes .

US20050273730A1
CLAIM 6
. The method as recited in claim 1 wherein said step of identifying and performing any node compression necessary for displaying said linked information based on the layout of said plurality of nodes is further comprised of the step of : d1) determining from said layout that said nodes will not fit vertically into said display area ;
d2) identifying subtrees in said layout causing said layout not to fit in said display area ;
d3) causing said sub (first data, first data group, second data group) trees to be displayed in a manner proportionate to the size of the subtree .

US20050273730A1
CLAIM 8
. The method as recited in claim 1 wherein said step of displaying said linked information is further comprised of the step of displaying a first set (second set, first set, data set, first data set, second data set) of data items associated with said nodes .

US20050273730A1
CLAIM 9
. The method as recited in claim 8 further comprising the step of : detecting that a user has requested that a second set (second set, first set, data set, first data set, second data set) of data items associated with said nodes be displayed ;
and displaying said second set of data items associated with said nodes .

US20050273730A1
CLAIM 19
. The method of claim 1 , further comprising the steps of : determining one or more sets (second set, first set, data set, first data set, second data set) of nodes from the plurality of linked nodes , the nodes in each set associated with a plurality of faces ;
associating groups of related display items to successive faces of the one or more sets of nodes from the plurality of linked nodes ;
and rotating all of the nodes in each set simultaneously based on a user input .

US8190610B2
CLAIM 41
. The computer system of claim 40 , wherein the reducing is accomplished by iterating on at least one key from each of the first and second intermediate data set (second set, more sets, first set) so that the output data set is a merging of a portion of the first and second intermediate data set .
US20050273730A1
CLAIM 8
. The method as recited in claim 1 wherein said step of displaying said linked information is further comprised of the step of displaying a first set (second set, first set, data set, first data set, second data set) of data items associated with said nodes .

US20050273730A1
CLAIM 9
. The method as recited in claim 8 further comprising the step of : detecting that a user has requested that a second set (second set, first set, data set, first data set, second data set) of data items associated with said nodes be displayed ;
and displaying said second set of data items associated with said nodes .

US20050273730A1
CLAIM 19
. The method of claim 1 , further comprising the steps of : determining one or more sets (second set, first set, data set, first data set, second data set) of nodes from the plurality of linked nodes , the nodes in each set associated with a plurality of faces ;
associating groups of related display items to successive faces of the one or more sets of nodes from the plurality of linked nodes ;
and rotating all of the nodes in each set simultaneously based on a user input .

US8190610B2
CLAIM 42
. The computer system of claim 40 , wherein the reducing is accomplished by a plurality of reducers , the d the at least one processor and memory further operable for : partitioning the first or second intermediate data set (second set, more sets, first set) into a plurality of partitions , at least some of the key-value pairs of the first or second intermediate data being provided to more than one of the partitions ;

and providing the partitioned intermediate data set of each partition to a separate one of the reducers .
US20050273730A1
CLAIM 8
. The method as recited in claim 1 wherein said step of displaying said linked information is further comprised of the step of displaying a first set (second set, first set, data set, first data set, second data set) of data items associated with said nodes .

US20050273730A1
CLAIM 9
. The method as recited in claim 8 further comprising the step of : detecting that a user has requested that a second set (second set, first set, data set, first data set, second data set) of data items associated with said nodes be displayed ;
and displaying said second set of data items associated with said nodes .

US20050273730A1
CLAIM 19
. The method of claim 1 , further comprising the steps of : determining one or more sets (second set, first set, data set, first data set, second data set) of nodes from the plurality of linked nodes , the nodes in each set associated with a plurality of faces ;
associating groups of related display items to successive faces of the one or more sets of nodes from the plurality of linked nodes ;
and rotating all of the nodes in each set simultaneously based on a user input .

US8190610B2
CLAIM 43
. The computer system of claim 42 , wherein all of the key-value pairs of the first and second intermediate data set (second set, more sets, first set) are provided to all of the reducers .
US20050273730A1
CLAIM 8
. The method as recited in claim 1 wherein said step of displaying said linked information is further comprised of the step of displaying a first set (second set, first set, data set, first data set, second data set) of data items associated with said nodes .

US20050273730A1
CLAIM 9
. The method as recited in claim 8 further comprising the step of : detecting that a user has requested that a second set (second set, first set, data set, first data set, second data set) of data items associated with said nodes be displayed ;
and displaying said second set of data items associated with said nodes .

US20050273730A1
CLAIM 19
. The method of claim 1 , further comprising the steps of : determining one or more sets (second set, first set, data set, first data set, second data set) of nodes from the plurality of linked nodes , the nodes in each set associated with a plurality of faces ;
associating groups of related display items to successive faces of the one or more sets of nodes from the plurality of linked nodes ;
and rotating all of the nodes in each set simultaneously based on a user input .




US8190610B2

Filed: 2006-10-05     Issued: 2012-05-29

MapReduce for distributed database processing

(Original Assignee) Yahoo Inc     (Current Assignee) R2 Solutions LLC ; Altaba Inc

Ali Dasdan, Hung-Chih Yang, Ruey-Lung Hsiao
WO2006019752A1

Filed: 2005-07-12     Issued: 2006-02-23

Methods for authorizing transmission of content from first to second individual and authentication an individual based on an individual’s social network

(Original Assignee) Friendster, Inc.     

Christopher Lunt
US8190610B2
CLAIM 1
. A method of processing data of a data set over a distributed system , wherein the data set comprises a plurality of data groups , the method comprising : partitioning the data of each one of the data groups into a plurality of data partitions that each have a plurality of key-value pairs and providing each data partition to a selected one of a plurality of mapping functions that are each user-configurable to independently output a plurality of lists of values for each of a set of keys found in such map function' ;

s corresponding data partition to form corresponding intermediate data for that data group and identifiable to that data group , wherein the data of a first data group has a different schema (receiving input) than the data of a second data group and the data of the first data group is mapped differently than the data of the second data group so that different lists of values are output for the corresponding different intermediate data , wherein the different schema and corresponding different intermediate data have a key in common ;

and reducing the intermediate data for the data groups to at least one output data group , including processing the intermediate data for each data group in a manner that is defined to correspond to that data group , so as to result in a merging of the corresponding different intermediate data based on the key in common , wherein the mapping and reducing operations are performed by a distributed system .
WO2006019752A1
CLAIM 9
. A method of authenticating an individual based on an approved list of users , a black list of users , and the individual' ;
s social network , comprising the steps of : receiving input (different schema) s by the individual , said inputs including identifying information of the individual ;
generating a gray list based on the black list and the individual' ;
s social network ;
and authenticating the individual if the individual is connected to a user who is on the approved list and in the individual' ;
s social network , along a path that does not traverse through anyone identified in the gray list .

US8190610B2
CLAIM 12
. The method of claim 5 , wherein : the intermediate data includes a plurality of grouped sets of key/value pairs ;

the reducing step is carried out by a plurality of reducers ;

for at least one of the reducers , the iterator corresponding to a particular data group , for that reducer , operates according to a different key of a different schema (receiving input) than the iterator corresponding to another particular data group , for that reducer .
WO2006019752A1
CLAIM 9
. A method of authenticating an individual based on an approved list of users , a black list of users , and the individual' ;
s social network , comprising the steps of : receiving input (different schema) s by the individual , said inputs including identifying information of the individual ;
generating a gray list based on the black list and the individual' ;
s social network ;
and authenticating the individual if the individual is connected to a user who is on the approved list and in the individual' ;
s social network , along a path that does not traverse through anyone identified in the gray list .

US8190610B2
CLAIM 17
. A computer system including a plurality of computing devices , the computer system configured to process data of a data set , wherein the data set comprises a plurality of data groups , the computer system comprises at least one processor and memory that are operable to perform the following operations : partitioning the data of each one of the data groups into a plurality of data partitions that each have a plurality of key-value pairs and providing each data partition to a selected one of a plurality of mapping functions that are each user-configurable to independently output a plurality of lists of values for each of a set of keys found in such map function' ;

s corresponding data partition to form corresponding intermediate data for that data group and identifiable to that data group , wherein the data of a first data group has a different schema (receiving input) than the data of a second data group and the data of the first data group is mapped differently than the data of the second data group so that different lists of values are output for the corresponding different intermediate data , wherein the different schema and corresponding different intermediate data have a key in common ;

and reduce the intermediate data for the data groups to at least one output data group , including processing the intermediate data for each data group in a manner that is defined to correspond to that data group so as to result in a merging of the corresponding different intermediate data based on the key in common .
WO2006019752A1
CLAIM 9
. A method of authenticating an individual based on an approved list of users , a black list of users , and the individual' ;
s social network , comprising the steps of : receiving input (different schema) s by the individual , said inputs including identifying information of the individual ;
generating a gray list based on the black list and the individual' ;
s social network ;
and authenticating the individual if the individual is connected to a user who is on the approved list and in the individual' ;
s social network , along a path that does not traverse through anyone identified in the gray list .

US8190610B2
CLAIM 22
. The computer system of claim 21 , wherein : the intermediate data includes a plurality of grouped sets of key/value pairs ;

the at least one processor and memory are further operable to reduce the intermediate data via a plurality of reducers ;

for at least one of the reducers , the iterator corresponding to a particular data group , for that reducer , is configured to operate according to a different key of a different schema (receiving input) than the iterator corresponding to another particular data group , for that reducer .
WO2006019752A1
CLAIM 9
. A method of authenticating an individual based on an approved list of users , a black list of users , and the individual' ;
s social network , comprising the steps of : receiving input (different schema) s by the individual , said inputs including identifying information of the individual ;
generating a gray list based on the black list and the individual' ;
s social network ;
and authenticating the individual if the individual is connected to a user who is on the approved list and in the individual' ;
s social network , along a path that does not traverse through anyone identified in the gray list .

US8190610B2
CLAIM 33
. A map-reduce method of processing data from a plurality of groups having different schema (receiving input) over a computer system , the method comprising : for a first data set having a plurality of first key-value pairs , wherein such first data set belongs to a first data group and the first key-value pairs have a first schema , partitioning the first data set into a plurality of data partitions and providing each data partition to a selected one of a plurality of mapping functions that are each user-configurable to independently output a plurality of lists of values for each of a set of keys found in such map function' ;

s corresponding data partition to form a first intermediate data set having a first set of resulting key-value pairs ;

for a second data set having a plurality of second key-value pairs , wherein such second data set belongs to a second data group and the second key-value pairs have a second schema , partitioning the first data set into a plurality of data partitions and providing each data partition to a selected one of a plurality of mapping functions to form a second intermediate data set having a second set (second set) of resulting key-value pairs differing from the first set of resulting key-value pairs , wherein the first schema differs from the second schema , wherein the first and second schema and the first and second set of resulting key-value pairs have a key in common ;

and reducing the first and the second intermediate data set together so as to form an output data set , wherein the reducing is accomplished by iterating on at least one key from the first and second intermediate data set , and the output data set has a different schema than the first and second schema , wherein the mapping and reducing operations are performed by a distributed system .
WO2006019752A1
CLAIM 1
. A method of authorizing transmission of content relating to a first individual to a second individual based on the second individual' ;
s social network , comprising the steps of : maintaining a first set of records for the second individual ;
generating a second set (second set) of records based on the first set and the second individual' ;
s social network ;
and authorizing the transmission of content to the second individual if the first individual and the second individual are connected in the second individual' ;
s social network along a path that does not traverse through any individual identified in the second set .

WO2006019752A1
CLAIM 9
. A method of authenticating an individual based on an approved list of users , a black list of users , and the individual' ;
s social network , comprising the steps of : receiving input (different schema) s by the individual , said inputs including identifying information of the individual ;
generating a gray list based on the black list and the individual' ;
s social network ;
and authenticating the individual if the individual is connected to a user who is on the approved list and in the individual' ;
s social network , along a path that does not traverse through anyone identified in the gray list .

US8190610B2
CLAIM 40
. A computer system including a plurality of computing devices , the computer system configured to process data of a data set , wherein the data set comprises a plurality of data groups , the computer system comprises at least one processor and memory that are operable to perform the following operations : for a first data set having a plurality of first key-value pairs , wherein such first data set belongs to a first data group and the first key-value pairs have a first schema , partitioning the first data set into a plurality of data partitions and providing each data partition to a selected one of a plurality of mapping functions that are each user-configurable to independently output a plurality of lists of values for each of a set of keys found in such map function' ;

s corresponding data partition to form a first intermediate data set having a first set of resulting key-value pairs ;

for a second data set having a plurality of second key-value pairs , wherein such second data set belongs to a second data group and the second key-value pairs have a second schema , partitioning the first data set into a plurality of data partitions and providing each data partition to a selected one of a plurality of mapping functions to form a second intermediate data set having a second set (second set) of resulting key-value pairs differing from the first set of resulting key-value pairs , wherein the first schema differs from the second schema ;

and reducing the first and the second intermediate data set together so as to form an output data set , wherein the reducing is accomplished by iterating on at least one key from the first and second intermediate data set , and the output data set has a different schema (receiving input) than the first and second schema , wherein the first and second schema and the first and second set of resulting key-value pairs have a key in common .
WO2006019752A1
CLAIM 1
. A method of authorizing transmission of content relating to a first individual to a second individual based on the second individual' ;
s social network , comprising the steps of : maintaining a first set of records for the second individual ;
generating a second set (second set) of records based on the first set and the second individual' ;
s social network ;
and authorizing the transmission of content to the second individual if the first individual and the second individual are connected in the second individual' ;
s social network along a path that does not traverse through any individual identified in the second set .

WO2006019752A1
CLAIM 9
. A method of authenticating an individual based on an approved list of users , a black list of users , and the individual' ;
s social network , comprising the steps of : receiving input (different schema) s by the individual , said inputs including identifying information of the individual ;
generating a gray list based on the black list and the individual' ;
s social network ;
and authenticating the individual if the individual is connected to a user who is on the approved list and in the individual' ;
s social network , along a path that does not traverse through anyone identified in the gray list .




US8190610B2

Filed: 2006-10-05     Issued: 2012-05-29

MapReduce for distributed database processing

(Original Assignee) Yahoo Inc     (Current Assignee) R2 Solutions LLC ; Altaba Inc

Ali Dasdan, Hung-Chih Yang, Ruey-Lung Hsiao
JP2006018843A

Filed: 2005-07-01     Issued: 2006-01-19

ページカテゴリ情報の使用による検索エンジン結果の分散

(Original Assignee) Microsoft Corp; マイクロソフト コーポレーション     

Nicole A Hamilton, Gregory N Hullender, Bama Ramarathnam, Darren A Shakib, エヌ.ヒューレンダー グレゴリー, エー.シャキブ ダレン, エー.ハミルトン ニコル, ラマラスナム バマ
US8190610B2
CLAIM 1
. A method of processing data of a data set (アップ) over a distributed system , wherein the data set comprises a plurality of data groups , the method comprising : partitioning the data of each one of the data groups into a plurality of data partitions that each have a plurality of key-value pairs and providing each data partition to a selected one (各カテゴリ) of a plurality of mapping functions that are each user-configurable to independently output a plurality of lists of values for each of a set of keys found in such map function' ;

s corresponding data partition to form corresponding intermediate data for that data group (えること) and identifiable to that data group , wherein the data of a first data group has a different schema than the data of a second data group and the data of the first data group is mapped differently than the data of the second data group so that different lists of values are output for the corresponding different intermediate data , wherein the different schema and corresponding different intermediate data have a key in common ;

and reducing the intermediate data for the data groups to at least one output data group , including processing the intermediate data for each data group in a manner that is defined to correspond to that data group , so as to result in a merging of the corresponding different intermediate data based on the key in common , wherein the mapping and reducing operations are performed by a distributed system .
JP2006018843A
CLAIM 1
検索結果を生成する方法であって、 ユーザから検索要求を受信すること、 前記受信した検索要求の関数として検索可能なインデックスで、複数の電子ドキュメントの各々に関連付けられる1つ又は複数のカテゴリを規定するドキュメント・データを収容するインデックスにクエリして、前記検索要求に関連する可能性のある前記電子ドキュメントを識別すること、 前記クエリの結果を、前記識別した電子ドキュメントに関連付けられる前記1つ又は複数のカテゴリの関数としてソートすること、および 前記ソートした結果を、異なるカテゴリ内の1つ又は複数の前記識別された電子ドキュメントが前記ユーザに対し単一ページに表示されるように、前記ユーザに対し表示すること を備えること (data group, first data group) を特徴とする方法。

JP2006018843A
CLAIM 4
前記表示することは、前記ソートした結果を、それぞれが特定のカテゴリに対応する複数のグループの形態で表示することを含み、 各グループは、前記特定のカテゴリ内の前記識別された電子ドキュメントの記述を、前記特定のカテゴリ内の識別された各々の前記電子ドキュメントに関連付けられるランク付与値に基づき、降順でリストアップ (first set, second set, data set, first data set, second data set, second intermediate data set, output data set, intermediate data set) されている ことを特徴とする請求項2に記載の方法。

JP2006018843A
CLAIM 8
前記ドキュメント・データは、前記1つ又は複数のカテゴリに関連する前記電子ドキュメントの各々に関連付けられるランク付与値を含み、 前記ランク付与値は、前記インデックスに格納され、特定のカテゴリに対する特定の電子ドキュメントの関連性を指示し、および 前記ユーザ・インターフェース・コンポーネントは、前記異なる各カテゴリ (selected one) 内の所定数の識別された電子ドキュメントを、前記ランク付与値の関数として表示するように、構成されている ことを特徴とする請求項7に記載のコンピュータ可読媒体。

US8190610B2
CLAIM 2
. The method of claim 1 , wherein : the at least one output data group (えること) is a plurality of output data groups .
JP2006018843A
CLAIM 1
検索結果を生成する方法であって、 ユーザから検索要求を受信すること、 前記受信した検索要求の関数として検索可能なインデックスで、複数の電子ドキュメントの各々に関連付けられる1つ又は複数のカテゴリを規定するドキュメント・データを収容するインデックスにクエリして、前記検索要求に関連する可能性のある前記電子ドキュメントを識別すること、 前記クエリの結果を、前記識別した電子ドキュメントに関連付けられる前記1つ又は複数のカテゴリの関数としてソートすること、および 前記ソートした結果を、異なるカテゴリ内の1つ又は複数の前記識別された電子ドキュメントが前記ユーザに対し単一ページに表示されるように、前記ユーザに対し表示すること を備えること (data group, first data group) を特徴とする方法。

US8190610B2
CLAIM 3
. The method of claim 1 , wherein : corresponding intermediate data for a data group (えること) being identifiable to the data group includes the corresponding intermediate data being stored such that metadata associated with a manner of storage is identifiable to the data group .
JP2006018843A
CLAIM 1
検索結果を生成する方法であって、 ユーザから検索要求を受信すること、 前記受信した検索要求の関数として検索可能なインデックスで、複数の電子ドキュメントの各々に関連付けられる1つ又は複数のカテゴリを規定するドキュメント・データを収容するインデックスにクエリして、前記検索要求に関連する可能性のある前記電子ドキュメントを識別すること、 前記クエリの結果を、前記識別した電子ドキュメントに関連付けられる前記1つ又は複数のカテゴリの関数としてソートすること、および 前記ソートした結果を、異なるカテゴリ内の1つ又は複数の前記識別された電子ドキュメントが前記ユーザに対し単一ページに表示されるように、前記ユーザに対し表示すること を備えること (data group, first data group) を特徴とする方法。

US8190610B2
CLAIM 4
. The method of claim 1 , wherein : corresponding intermediate data for a data group (えること) being identifiable to the data group includes the corresponding intermediate data being stored in association with data that is identifiable to the data group .
JP2006018843A
CLAIM 1
検索結果を生成する方法であって、 ユーザから検索要求を受信すること、 前記受信した検索要求の関数として検索可能なインデックスで、複数の電子ドキュメントの各々に関連付けられる1つ又は複数のカテゴリを規定するドキュメント・データを収容するインデックスにクエリして、前記検索要求に関連する可能性のある前記電子ドキュメントを識別すること、 前記クエリの結果を、前記識別した電子ドキュメントに関連付けられる前記1つ又は複数のカテゴリの関数としてソートすること、および 前記ソートした結果を、異なるカテゴリ内の1つ又は複数の前記識別された電子ドキュメントが前記ユーザに対し単一ページに表示されるように、前記ユーザに対し表示すること を備えること (data group, first data group) を特徴とする方法。

US8190610B2
CLAIM 5
. The method of claim 1 , wherein : processing the intermediate data for each data group (えること) in a manner that is defined to correspond to that data group includes , for each data group , employing an iterator that corresponds to that data group .
JP2006018843A
CLAIM 1
検索結果を生成する方法であって、 ユーザから検索要求を受信すること、 前記受信した検索要求の関数として検索可能なインデックスで、複数の電子ドキュメントの各々に関連付けられる1つ又は複数のカテゴリを規定するドキュメント・データを収容するインデックスにクエリして、前記検索要求に関連する可能性のある前記電子ドキュメントを識別すること、 前記クエリの結果を、前記識別した電子ドキュメントに関連付けられる前記1つ又は複数のカテゴリの関数としてソートすること、および 前記ソートした結果を、異なるカテゴリ内の1つ又は複数の前記識別された電子ドキュメントが前記ユーザに対し単一ページに表示されるように、前記ユーザに対し表示すること を備えること (data group, first data group) を特徴とする方法。

US8190610B2
CLAIM 11
. The method of claim 10 , wherein : processing the intermediate data for each data group (えること) in a manner that is defined to correspond to that data group includes , for each data group , employing an iterator that corresponds to that data group , wherein the iterator includes providing the associated metadata to the processing of the reducing step .
JP2006018843A
CLAIM 1
検索結果を生成する方法であって、 ユーザから検索要求を受信すること、 前記受信した検索要求の関数として検索可能なインデックスで、複数の電子ドキュメントの各々に関連付けられる1つ又は複数のカテゴリを規定するドキュメント・データを収容するインデックスにクエリして、前記検索要求に関連する可能性のある前記電子ドキュメントを識別すること、 前記クエリの結果を、前記識別した電子ドキュメントに関連付けられる前記1つ又は複数のカテゴリの関数としてソートすること、および 前記ソートした結果を、異なるカテゴリ内の1つ又は複数の前記識別された電子ドキュメントが前記ユーザに対し単一ページに表示されるように、前記ユーザに対し表示すること を備えること (data group, first data group) を特徴とする方法。

US8190610B2
CLAIM 12
. The method of claim 5 , wherein : the intermediate data includes a plurality of grouped sets of key/value pairs ;

the reducing step is carried out by a plurality of reducers ;

for at least one of the reducers , the iterator corresponding to a particular data group (えること) , for that reducer , operates according to a different key of a different schema than the iterator corresponding to another particular data group , for that reducer .
JP2006018843A
CLAIM 1
検索結果を生成する方法であって、 ユーザから検索要求を受信すること、 前記受信した検索要求の関数として検索可能なインデックスで、複数の電子ドキュメントの各々に関連付けられる1つ又は複数のカテゴリを規定するドキュメント・データを収容するインデックスにクエリして、前記検索要求に関連する可能性のある前記電子ドキュメントを識別すること、 前記クエリの結果を、前記識別した電子ドキュメントに関連付けられる前記1つ又は複数のカテゴリの関数としてソートすること、および 前記ソートした結果を、異なるカテゴリ内の1つ又は複数の前記識別された電子ドキュメントが前記ユーザに対し単一ページに表示されるように、前記ユーザに対し表示すること を備えること (data group, first data group) を特徴とする方法。

US8190610B2
CLAIM 13
. The method of claim 1 , wherein : the intermediate data processing step of the reducing step further comprises processing data that is not intermediate data (含むコンピュータ) .
JP2006018843A
CLAIM 7
検索結果を生成するコンピュータ実行可能命令を含むコンピュータ (not intermediate data) 可読媒体であって、 ユーザから検索要求を受信する検索フォーム・コンポーネント、 前記受信した検索要求の関数として検索可能なインデックスで、複数の電子ドキュメントに関連付けられ、各々の電子ドキュメントに関連付けられる1つ又は複数のカテゴリを特定するドキュメント・データを含むインデックスにクエリして、前記検索要求に関連する可能性のある電子ドキュメントを識別し、及び前記クエリの結果を、前記識別された電子ドキュメントに関連付けられる前記1つ又は複数のカテゴリの関数としてソートする検索エンジン・コンポーネント、および 前記ソートした結果を、異なるカテゴリ内の前記1つ又は複数の識別された電子ドキュメントが前記ユーザに対し単一ページに表示されるよう、前記ユーザに対して表示するユーザ・インターフェース・コンポーネント を備えることを特徴とするコンピュータ可読媒体。

US8190610B2
CLAIM 14
. The method of claim 13 , wherein : the reducing step is carried out by a plurality of reducers ;

and the data that is not intermediate data (含むコンピュータ) , for a particular reducer , includes data that is associated with another reducer .
JP2006018843A
CLAIM 7
検索結果を生成するコンピュータ実行可能命令を含むコンピュータ (not intermediate data) 可読媒体であって、 ユーザから検索要求を受信する検索フォーム・コンポーネント、 前記受信した検索要求の関数として検索可能なインデックスで、複数の電子ドキュメントに関連付けられ、各々の電子ドキュメントに関連付けられる1つ又は複数のカテゴリを特定するドキュメント・データを含むインデックスにクエリして、前記検索要求に関連する可能性のある電子ドキュメントを識別し、及び前記クエリの結果を、前記識別された電子ドキュメントに関連付けられる前記1つ又は複数のカテゴリの関数としてソートする検索エンジン・コンポーネント、および 前記ソートした結果を、異なるカテゴリ内の前記1つ又は複数の識別された電子ドキュメントが前記ユーザに対し単一ページに表示されるよう、前記ユーザに対して表示するユーザ・インターフェース・コンポーネント を備えることを特徴とするコンピュータ可読媒体。

US8190610B2
CLAIM 15
. The method of claim 13 , wherein : the reducing step is carried out by a plurality of reducers ;

and the data that is not intermediate data (含むコンピュータ) , for a particular reducer , includes data that is associated with that reducer .
JP2006018843A
CLAIM 7
検索結果を生成するコンピュータ実行可能命令を含むコンピュータ (not intermediate data) 可読媒体であって、 ユーザから検索要求を受信する検索フォーム・コンポーネント、 前記受信した検索要求の関数として検索可能なインデックスで、複数の電子ドキュメントに関連付けられ、各々の電子ドキュメントに関連付けられる1つ又は複数のカテゴリを特定するドキュメント・データを含むインデックスにクエリして、前記検索要求に関連する可能性のある電子ドキュメントを識別し、及び前記クエリの結果を、前記識別された電子ドキュメントに関連付けられる前記1つ又は複数のカテゴリの関数としてソートする検索エンジン・コンポーネント、および 前記ソートした結果を、異なるカテゴリ内の前記1つ又は複数の識別された電子ドキュメントが前記ユーザに対し単一ページに表示されるよう、前記ユーザに対して表示するユーザ・インターフェース・コンポーネント を備えることを特徴とするコンピュータ可読媒体。

US8190610B2
CLAIM 17
. A computer system including a plurality of computing devices , the computer system configured to process data of a data set (アップ) , wherein the data set comprises a plurality of data groups , the computer system comprises at least one processor and memory that are operable to perform the following operations : partitioning the data of each one of the data groups into a plurality of data partitions that each have a plurality of key-value pairs and providing each data partition to a selected one (各カテゴリ) of a plurality of mapping functions that are each user-configurable to independently output a plurality of lists of values for each of a set of keys found in such map function' ;

s corresponding data partition to form corresponding intermediate data for that data group (えること) and identifiable to that data group , wherein the data of a first data group has a different schema than the data of a second data group and the data of the first data group is mapped differently than the data of the second data group so that different lists of values are output for the corresponding different intermediate data , wherein the different schema and corresponding different intermediate data have a key in common ;

and reduce the intermediate data for the data groups to at least one output data group , including processing the intermediate data for each data group in a manner that is defined to correspond to that data group so as to result in a merging of the corresponding different intermediate data based on the key in common .
JP2006018843A
CLAIM 1
検索結果を生成する方法であって、 ユーザから検索要求を受信すること、 前記受信した検索要求の関数として検索可能なインデックスで、複数の電子ドキュメントの各々に関連付けられる1つ又は複数のカテゴリを規定するドキュメント・データを収容するインデックスにクエリして、前記検索要求に関連する可能性のある前記電子ドキュメントを識別すること、 前記クエリの結果を、前記識別した電子ドキュメントに関連付けられる前記1つ又は複数のカテゴリの関数としてソートすること、および 前記ソートした結果を、異なるカテゴリ内の1つ又は複数の前記識別された電子ドキュメントが前記ユーザに対し単一ページに表示されるように、前記ユーザに対し表示すること を備えること (data group, first data group) を特徴とする方法。

JP2006018843A
CLAIM 4
前記表示することは、前記ソートした結果を、それぞれが特定のカテゴリに対応する複数のグループの形態で表示することを含み、 各グループは、前記特定のカテゴリ内の前記識別された電子ドキュメントの記述を、前記特定のカテゴリ内の識別された各々の前記電子ドキュメントに関連付けられるランク付与値に基づき、降順でリストアップ (first set, second set, data set, first data set, second data set, second intermediate data set, output data set, intermediate data set) されている ことを特徴とする請求項2に記載の方法。

JP2006018843A
CLAIM 8
前記ドキュメント・データは、前記1つ又は複数のカテゴリに関連する前記電子ドキュメントの各々に関連付けられるランク付与値を含み、 前記ランク付与値は、前記インデックスに格納され、特定のカテゴリに対する特定の電子ドキュメントの関連性を指示し、および 前記ユーザ・インターフェース・コンポーネントは、前記異なる各カテゴリ (selected one) 内の所定数の識別された電子ドキュメントを、前記ランク付与値の関数として表示するように、構成されている ことを特徴とする請求項7に記載のコンピュータ可読媒体。

US8190610B2
CLAIM 18
. The computer system of claim 17 , wherein : the at least one output data group (えること) is a plurality of output data groups .
JP2006018843A
CLAIM 1
検索結果を生成する方法であって、 ユーザから検索要求を受信すること、 前記受信した検索要求の関数として検索可能なインデックスで、複数の電子ドキュメントの各々に関連付けられる1つ又は複数のカテゴリを規定するドキュメント・データを収容するインデックスにクエリして、前記検索要求に関連する可能性のある前記電子ドキュメントを識別すること、 前記クエリの結果を、前記識別した電子ドキュメントに関連付けられる前記1つ又は複数のカテゴリの関数としてソートすること、および 前記ソートした結果を、異なるカテゴリ内の1つ又は複数の前記識別された電子ドキュメントが前記ユーザに対し単一ページに表示されるように、前記ユーザに対し表示すること を備えること (data group, first data group) を特徴とする方法。

US8190610B2
CLAIM 19
. The computer system of claim 17 , wherein : corresponding intermediate data for a data group (えること) being identifiable to the data group includes the corresponding intermediate data being stored such that metadata associated with the manner of storage is identifiable to the data group .
JP2006018843A
CLAIM 1
検索結果を生成する方法であって、 ユーザから検索要求を受信すること、 前記受信した検索要求の関数として検索可能なインデックスで、複数の電子ドキュメントの各々に関連付けられる1つ又は複数のカテゴリを規定するドキュメント・データを収容するインデックスにクエリして、前記検索要求に関連する可能性のある前記電子ドキュメントを識別すること、 前記クエリの結果を、前記識別した電子ドキュメントに関連付けられる前記1つ又は複数のカテゴリの関数としてソートすること、および 前記ソートした結果を、異なるカテゴリ内の1つ又は複数の前記識別された電子ドキュメントが前記ユーザに対し単一ページに表示されるように、前記ユーザに対し表示すること を備えること (data group, first data group) を特徴とする方法。

US8190610B2
CLAIM 20
. The computer system of claim 19 , wherein : corresponding intermediate data for a data group (えること) being identifiable to the data group includes the computer system being configured to store the corresponding intermediate data in association with data that is identifiable to the data group .
JP2006018843A
CLAIM 1
検索結果を生成する方法であって、 ユーザから検索要求を受信すること、 前記受信した検索要求の関数として検索可能なインデックスで、複数の電子ドキュメントの各々に関連付けられる1つ又は複数のカテゴリを規定するドキュメント・データを収容するインデックスにクエリして、前記検索要求に関連する可能性のある前記電子ドキュメントを識別すること、 前記クエリの結果を、前記識別した電子ドキュメントに関連付けられる前記1つ又は複数のカテゴリの関数としてソートすること、および 前記ソートした結果を、異なるカテゴリ内の1つ又は複数の前記識別された電子ドキュメントが前記ユーザに対し単一ページに表示されるように、前記ユーザに対し表示すること を備えること (data group, first data group) を特徴とする方法。

US8190610B2
CLAIM 21
. The computer system of claim 17 , wherein : the at least one processor and memory being configured to process the intermediate data for each data group (えること) in a manner that is defined to correspond to that data group includes , for each data group , the at least one processor and memory being configured to employ an iterator that corresponds to that data group .
JP2006018843A
CLAIM 1
検索結果を生成する方法であって、 ユーザから検索要求を受信すること、 前記受信した検索要求の関数として検索可能なインデックスで、複数の電子ドキュメントの各々に関連付けられる1つ又は複数のカテゴリを規定するドキュメント・データを収容するインデックスにクエリして、前記検索要求に関連する可能性のある前記電子ドキュメントを識別すること、 前記クエリの結果を、前記識別した電子ドキュメントに関連付けられる前記1つ又は複数のカテゴリの関数としてソートすること、および 前記ソートした結果を、異なるカテゴリ内の1つ又は複数の前記識別された電子ドキュメントが前記ユーザに対し単一ページに表示されるように、前記ユーザに対し表示すること を備えること (data group, first data group) を特徴とする方法。

US8190610B2
CLAIM 22
. The computer system of claim 21 , wherein : the intermediate data includes a plurality of grouped sets of key/value pairs ;

the at least one processor and memory are further operable to reduce the intermediate data via a plurality of reducers ;

for at least one of the reducers , the iterator corresponding to a particular data group (えること) , for that reducer , is configured to operate according to a different key of a different schema than the iterator corresponding to another particular data group , for that reducer .
JP2006018843A
CLAIM 1
検索結果を生成する方法であって、 ユーザから検索要求を受信すること、 前記受信した検索要求の関数として検索可能なインデックスで、複数の電子ドキュメントの各々に関連付けられる1つ又は複数のカテゴリを規定するドキュメント・データを収容するインデックスにクエリして、前記検索要求に関連する可能性のある前記電子ドキュメントを識別すること、 前記クエリの結果を、前記識別した電子ドキュメントに関連付けられる前記1つ又は複数のカテゴリの関数としてソートすること、および 前記ソートした結果を、異なるカテゴリ内の1つ又は複数の前記識別された電子ドキュメントが前記ユーザに対し単一ページに表示されるように、前記ユーザに対し表示すること を備えること (data group, first data group) を特徴とする方法。

US8190610B2
CLAIM 28
. The computer system of claim 27 , wherein : the at least one processor and memory being further operable to process the intermediate data for each data group (えること) in a manner that is defined to correspond to that data group includes , for each data group , the at least one processor and memory being further operable to employ an iterator that corresponds to that data group , wherein the iterator is configured to provide the associated metadata to the reducing .
JP2006018843A
CLAIM 1
検索結果を生成する方法であって、 ユーザから検索要求を受信すること、 前記受信した検索要求の関数として検索可能なインデックスで、複数の電子ドキュメントの各々に関連付けられる1つ又は複数のカテゴリを規定するドキュメント・データを収容するインデックスにクエリして、前記検索要求に関連する可能性のある前記電子ドキュメントを識別すること、 前記クエリの結果を、前記識別した電子ドキュメントに関連付けられる前記1つ又は複数のカテゴリの関数としてソートすること、および 前記ソートした結果を、異なるカテゴリ内の1つ又は複数の前記識別された電子ドキュメントが前記ユーザに対し単一ページに表示されるように、前記ユーザに対し表示すること を備えること (data group, first data group) を特徴とする方法。

US8190610B2
CLAIM 29
. The computer system of claim 17 , wherein : the intermediate data processing of the reducing further comprises processing data that is not intermediate data (含むコンピュータ) .
JP2006018843A
CLAIM 7
検索結果を生成するコンピュータ実行可能命令を含むコンピュータ (not intermediate data) 可読媒体であって、 ユーザから検索要求を受信する検索フォーム・コンポーネント、 前記受信した検索要求の関数として検索可能なインデックスで、複数の電子ドキュメントに関連付けられ、各々の電子ドキュメントに関連付けられる1つ又は複数のカテゴリを特定するドキュメント・データを含むインデックスにクエリして、前記検索要求に関連する可能性のある電子ドキュメントを識別し、及び前記クエリの結果を、前記識別された電子ドキュメントに関連付けられる前記1つ又は複数のカテゴリの関数としてソートする検索エンジン・コンポーネント、および 前記ソートした結果を、異なるカテゴリ内の前記1つ又は複数の識別された電子ドキュメントが前記ユーザに対し単一ページに表示されるよう、前記ユーザに対して表示するユーザ・インターフェース・コンポーネント を備えることを特徴とするコンピュータ可読媒体。

US8190610B2
CLAIM 30
. The computer system of claim 29 , wherein : the at least one processor and memory are further operable to reduce the intermediate data via a plurality of reducers ;

and the data that is not intermediate data (含むコンピュータ) , for a particular reducer , includes data that is associated with another reducer .
JP2006018843A
CLAIM 7
検索結果を生成するコンピュータ実行可能命令を含むコンピュータ (not intermediate data) 可読媒体であって、 ユーザから検索要求を受信する検索フォーム・コンポーネント、 前記受信した検索要求の関数として検索可能なインデックスで、複数の電子ドキュメントに関連付けられ、各々の電子ドキュメントに関連付けられる1つ又は複数のカテゴリを特定するドキュメント・データを含むインデックスにクエリして、前記検索要求に関連する可能性のある電子ドキュメントを識別し、及び前記クエリの結果を、前記識別された電子ドキュメントに関連付けられる前記1つ又は複数のカテゴリの関数としてソートする検索エンジン・コンポーネント、および 前記ソートした結果を、異なるカテゴリ内の前記1つ又は複数の識別された電子ドキュメントが前記ユーザに対し単一ページに表示されるよう、前記ユーザに対して表示するユーザ・インターフェース・コンポーネント を備えることを特徴とするコンピュータ可読媒体。

US8190610B2
CLAIM 31
. The computer system of claim 29 , wherein : the at least one processor and memory are further operable to reduce the intermediate data via a plurality of reducers ;

and the data that is not intermediate data (含むコンピュータ) , for a particular reducer , includes data that is associated with that reducer .
JP2006018843A
CLAIM 7
検索結果を生成するコンピュータ実行可能命令を含むコンピュータ (not intermediate data) 可読媒体であって、 ユーザから検索要求を受信する検索フォーム・コンポーネント、 前記受信した検索要求の関数として検索可能なインデックスで、複数の電子ドキュメントに関連付けられ、各々の電子ドキュメントに関連付けられる1つ又は複数のカテゴリを特定するドキュメント・データを含むインデックスにクエリして、前記検索要求に関連する可能性のある電子ドキュメントを識別し、及び前記クエリの結果を、前記識別された電子ドキュメントに関連付けられる前記1つ又は複数のカテゴリの関数としてソートする検索エンジン・コンポーネント、および 前記ソートした結果を、異なるカテゴリ内の前記1つ又は複数の識別された電子ドキュメントが前記ユーザに対し単一ページに表示されるよう、前記ユーザに対して表示するユーザ・インターフェース・コンポーネント を備えることを特徴とするコンピュータ可読媒体。

US8190610B2
CLAIM 33
. A map-reduce method of processing data from a plurality of groups having different schema over a computer system , the method comprising : for a first data set (アップ) having a plurality of first key-value pairs , wherein such first data set belongs to a first data group (えること) and the first key-value pairs have a first schema , partitioning the first data set into a plurality of data partitions and providing each data partition to a selected one (各カテゴリ) of a plurality of mapping functions that are each user-configurable to independently output a plurality of lists of values for each of a set of keys found in such map function' ;

s corresponding data partition to form a first intermediate data set having a first set (アップ) of resulting key-value pairs ;

for a second data set having a plurality of second key-value pairs , wherein such second data set belongs to a second data group and the second key-value pairs have a second schema , partitioning the first data set into a plurality of data partitions and providing each data partition to a selected one of a plurality of mapping functions to form a second intermediate data (ファイル) set having a second set (アップ) of resulting key-value pairs differing from the first set of resulting key-value pairs , wherein the first schema differs from the second schema , wherein the first and second schema and the first and second set of resulting key-value pairs have a key in common ;

and reducing the first and the second intermediate data set together so as to form an output data set , wherein the reducing is accomplished by iterating on at least one key from the first and second intermediate data set , and the output data set has a different schema than the first and second schema , wherein the mapping and reducing operations are performed by a distributed system .
JP2006018843A
CLAIM 1
検索結果を生成する方法であって、 ユーザから検索要求を受信すること、 前記受信した検索要求の関数として検索可能なインデックスで、複数の電子ドキュメントの各々に関連付けられる1つ又は複数のカテゴリを規定するドキュメント・データを収容するインデックスにクエリして、前記検索要求に関連する可能性のある前記電子ドキュメントを識別すること、 前記クエリの結果を、前記識別した電子ドキュメントに関連付けられる前記1つ又は複数のカテゴリの関数としてソートすること、および 前記ソートした結果を、異なるカテゴリ内の1つ又は複数の前記識別された電子ドキュメントが前記ユーザに対し単一ページに表示されるように、前記ユーザに対し表示すること を備えること (data group, first data group) を特徴とする方法。

JP2006018843A
CLAIM 4
前記表示することは、前記ソートした結果を、それぞれが特定のカテゴリに対応する複数のグループの形態で表示することを含み、 各グループは、前記特定のカテゴリ内の前記識別された電子ドキュメントの記述を、前記特定のカテゴリ内の識別された各々の前記電子ドキュメントに関連付けられるランク付与値に基づき、降順でリストアップ (first set, second set, data set, first data set, second data set, second intermediate data set, output data set, intermediate data set) されている ことを特徴とする請求項2に記載の方法。

JP2006018843A
CLAIM 5
前記電子ドキュメントの各々は、ウェブ・ページ及びマルチメディアファイル (second intermediate data) のうちの1つ又は複数を備えることを特徴とする請求項2に記載の方法。

JP2006018843A
CLAIM 8
前記ドキュメント・データは、前記1つ又は複数のカテゴリに関連する前記電子ドキュメントの各々に関連付けられるランク付与値を含み、 前記ランク付与値は、前記インデックスに格納され、特定のカテゴリに対する特定の電子ドキュメントの関連性を指示し、および 前記ユーザ・インターフェース・コンポーネントは、前記異なる各カテゴリ (selected one) 内の所定数の識別された電子ドキュメントを、前記ランク付与値の関数として表示するように、構成されている ことを特徴とする請求項7に記載のコンピュータ可読媒体。

US8190610B2
CLAIM 34
. The map-reduce method of claim 33 , wherein the reducing is accomplished by iterating on at least one key from each of the first and second intermediate data (ファイル) set so that the output data set (アップ) is a merging of a portion of the first and second intermediate data set .
JP2006018843A
CLAIM 4
前記表示することは、前記ソートした結果を、それぞれが特定のカテゴリに対応する複数のグループの形態で表示することを含み、 各グループは、前記特定のカテゴリ内の前記識別された電子ドキュメントの記述を、前記特定のカテゴリ内の識別された各々の前記電子ドキュメントに関連付けられるランク付与値に基づき、降順でリストアップ (first set, second set, data set, first data set, second data set, second intermediate data set, output data set, intermediate data set) されている ことを特徴とする請求項2に記載の方法。

JP2006018843A
CLAIM 5
前記電子ドキュメントの各々は、ウェブ・ページ及びマルチメディアファイル (second intermediate data) のうちの1つ又は複数を備えることを特徴とする請求項2に記載の方法。

US8190610B2
CLAIM 35
. The map-reduce method of claim 33 , wherein the reducing is accomplished by a plurality of reducers , the method further comprises : partitioning the first or second intermediate data (ファイル) set into a plurality of partitions , at least some of the key-value pairs of the first or second intermediate data being provided to more than one of the partitions ;

and providing the partitioned intermediate data set (アップ) of each partition to a separate one of the reducers .
JP2006018843A
CLAIM 4
前記表示することは、前記ソートした結果を、それぞれが特定のカテゴリに対応する複数のグループの形態で表示することを含み、 各グループは、前記特定のカテゴリ内の前記識別された電子ドキュメントの記述を、前記特定のカテゴリ内の識別された各々の前記電子ドキュメントに関連付けられるランク付与値に基づき、降順でリストアップ (first set, second set, data set, first data set, second data set, second intermediate data set, output data set, intermediate data set) されている ことを特徴とする請求項2に記載の方法。

JP2006018843A
CLAIM 5
前記電子ドキュメントの各々は、ウェブ・ページ及びマルチメディアファイル (second intermediate data) のうちの1つ又は複数を備えることを特徴とする請求項2に記載の方法。

US8190610B2
CLAIM 36
. The map-reduce method of claim 35 , wherein all of the key-value pairs of the first and second intermediate data (ファイル) set are provided to all of the reducers .
JP2006018843A
CLAIM 4
前記表示することは、前記ソートした結果を、それぞれが特定のカテゴリに対応する複数のグループの形態で表示することを含み、 各グループは、前記特定のカテゴリ内の前記識別された電子ドキュメントの記述を、前記特定のカテゴリ内の識別された各々の前記電子ドキュメントに関連付けられるランク付与値に基づき、降順でリストアップ (first set, second set, data set, first data set, second data set, second intermediate data set, output data set, intermediate data set) されている ことを特徴とする請求項2に記載の方法。

JP2006018843A
CLAIM 5
前記電子ドキュメントの各々は、ウェブ・ページ及びマルチメディアファイル (second intermediate data) のうちの1つ又は複数を備えることを特徴とする請求項2に記載の方法。

US8190610B2
CLAIM 40
. A computer system including a plurality of computing devices , the computer system configured to process data of a data set (アップ) , wherein the data set comprises a plurality of data groups , the computer system comprises at least one processor and memory that are operable to perform the following operations : for a first data set having a plurality of first key-value pairs , wherein such first data set belongs to a first data group (えること) and the first key-value pairs have a first schema , partitioning the first data set into a plurality of data partitions and providing each data partition to a selected one (各カテゴリ) of a plurality of mapping functions that are each user-configurable to independently output a plurality of lists of values for each of a set of keys found in such map function' ;

s corresponding data partition to form a first intermediate data set having a first set (アップ) of resulting key-value pairs ;

for a second data set having a plurality of second key-value pairs , wherein such second data set belongs to a second data group and the second key-value pairs have a second schema , partitioning the first data set into a plurality of data partitions and providing each data partition to a selected one of a plurality of mapping functions to form a second intermediate data (ファイル) set having a second set (アップ) of resulting key-value pairs differing from the first set of resulting key-value pairs , wherein the first schema differs from the second schema ;

and reducing the first and the second intermediate data set together so as to form an output data set , wherein the reducing is accomplished by iterating on at least one key from the first and second intermediate data set , and the output data set has a different schema than the first and second schema , wherein the first and second schema and the first and second set of resulting key-value pairs have a key in common .
JP2006018843A
CLAIM 1
検索結果を生成する方法であって、 ユーザから検索要求を受信すること、 前記受信した検索要求の関数として検索可能なインデックスで、複数の電子ドキュメントの各々に関連付けられる1つ又は複数のカテゴリを規定するドキュメント・データを収容するインデックスにクエリして、前記検索要求に関連する可能性のある前記電子ドキュメントを識別すること、 前記クエリの結果を、前記識別した電子ドキュメントに関連付けられる前記1つ又は複数のカテゴリの関数としてソートすること、および 前記ソートした結果を、異なるカテゴリ内の1つ又は複数の前記識別された電子ドキュメントが前記ユーザに対し単一ページに表示されるように、前記ユーザに対し表示すること を備えること (data group, first data group) を特徴とする方法。

JP2006018843A
CLAIM 4
前記表示することは、前記ソートした結果を、それぞれが特定のカテゴリに対応する複数のグループの形態で表示することを含み、 各グループは、前記特定のカテゴリ内の前記識別された電子ドキュメントの記述を、前記特定のカテゴリ内の識別された各々の前記電子ドキュメントに関連付けられるランク付与値に基づき、降順でリストアップ (first set, second set, data set, first data set, second data set, second intermediate data set, output data set, intermediate data set) されている ことを特徴とする請求項2に記載の方法。

JP2006018843A
CLAIM 5
前記電子ドキュメントの各々は、ウェブ・ページ及びマルチメディアファイル (second intermediate data) のうちの1つ又は複数を備えることを特徴とする請求項2に記載の方法。

JP2006018843A
CLAIM 8
前記ドキュメント・データは、前記1つ又は複数のカテゴリに関連する前記電子ドキュメントの各々に関連付けられるランク付与値を含み、 前記ランク付与値は、前記インデックスに格納され、特定のカテゴリに対する特定の電子ドキュメントの関連性を指示し、および 前記ユーザ・インターフェース・コンポーネントは、前記異なる各カテゴリ (selected one) 内の所定数の識別された電子ドキュメントを、前記ランク付与値の関数として表示するように、構成されている ことを特徴とする請求項7に記載のコンピュータ可読媒体。

US8190610B2
CLAIM 41
. The computer system of claim 40 , wherein the reducing is accomplished by iterating on at least one key from each of the first and second intermediate data (ファイル) set so that the output data set (アップ) is a merging of a portion of the first and second intermediate data set .
JP2006018843A
CLAIM 4
前記表示することは、前記ソートした結果を、それぞれが特定のカテゴリに対応する複数のグループの形態で表示することを含み、 各グループは、前記特定のカテゴリ内の前記識別された電子ドキュメントの記述を、前記特定のカテゴリ内の識別された各々の前記電子ドキュメントに関連付けられるランク付与値に基づき、降順でリストアップ (first set, second set, data set, first data set, second data set, second intermediate data set, output data set, intermediate data set) されている ことを特徴とする請求項2に記載の方法。

JP2006018843A
CLAIM 5
前記電子ドキュメントの各々は、ウェブ・ページ及びマルチメディアファイル (second intermediate data) のうちの1つ又は複数を備えることを特徴とする請求項2に記載の方法。

US8190610B2
CLAIM 42
. The computer system of claim 40 , wherein the reducing is accomplished by a plurality of reducers , the d the at least one processor and memory further operable for : partitioning the first or second intermediate data (ファイル) set into a plurality of partitions , at least some of the key-value pairs of the first or second intermediate data being provided to more than one of the partitions ;

and providing the partitioned intermediate data set (アップ) of each partition to a separate one of the reducers .
JP2006018843A
CLAIM 4
前記表示することは、前記ソートした結果を、それぞれが特定のカテゴリに対応する複数のグループの形態で表示することを含み、 各グループは、前記特定のカテゴリ内の前記識別された電子ドキュメントの記述を、前記特定のカテゴリ内の識別された各々の前記電子ドキュメントに関連付けられるランク付与値に基づき、降順でリストアップ (first set, second set, data set, first data set, second data set, second intermediate data set, output data set, intermediate data set) されている ことを特徴とする請求項2に記載の方法。

JP2006018843A
CLAIM 5
前記電子ドキュメントの各々は、ウェブ・ページ及びマルチメディアファイル (second intermediate data) のうちの1つ又は複数を備えることを特徴とする請求項2に記載の方法。

US8190610B2
CLAIM 43
. The computer system of claim 42 , wherein all of the key-value pairs of the first and second intermediate data (ファイル) set are provided to all of the reducers .
JP2006018843A
CLAIM 4
前記表示することは、前記ソートした結果を、それぞれが特定のカテゴリに対応する複数のグループの形態で表示することを含み、 各グループは、前記特定のカテゴリ内の前記識別された電子ドキュメントの記述を、前記特定のカテゴリ内の識別された各々の前記電子ドキュメントに関連付けられるランク付与値に基づき、降順でリストアップ (first set, second set, data set, first data set, second data set, second intermediate data set, output data set, intermediate data set) されている ことを特徴とする請求項2に記載の方法。

JP2006018843A
CLAIM 5
前記電子ドキュメントの各々は、ウェブ・ページ及びマルチメディアファイル (second intermediate data) のうちの1つ又は複数を備えることを特徴とする請求項2に記載の方法。




US8190610B2

Filed: 2006-10-05     Issued: 2012-05-29

MapReduce for distributed database processing

(Original Assignee) Yahoo Inc     (Current Assignee) R2 Solutions LLC ; Altaba Inc

Ali Dasdan, Hung-Chih Yang, Ruey-Lung Hsiao
JP2006020310A

Filed: 2005-06-28     Issued: 2006-01-19

セクションデータフィルタリング方法及び装置

(Original Assignee) Samsung Electronics Co Ltd; 三星電子株式会社Samsung Electronics Co.,Ltd.     

Gyung-Pyo Hong, 競杓 洪
US8190610B2
CLAIM 1
. A method of processing data (コンピュ) of a data set over a distributed system , wherein the data set comprises a plurality of data groups , the method comprising : partitioning the data of each one of the data groups into a plurality of data partitions that each have a plurality of key-value pairs and providing each data partition to a selected one of a plurality of mapping functions that are each user-configurable to independently output a plurality of lists of values for each of a set of keys found in such map function' ;

s corresponding data partition to form corresponding intermediate data for that data group (えること) and identifiable to that data group , wherein the data of a first data group has a different schema than the data of a second data group and the data of the first data group is mapped differently than the data of the second data group so that different lists of values are output for the corresponding different intermediate data (のプログラム) , wherein the different schema and corresponding different intermediate data have a key in common ;

and reducing the intermediate data for the data groups to at least one output data group , including processing the intermediate data for each data group in a manner that is defined to correspond to that data group , so as to result in a merging of the corresponding different intermediate data based on the key in common , wherein the mapping and reducing operations are performed by a distributed system .
JP2006020310A
CLAIM 5
前記装置は、 a)前記デジタル放送受信器内で既に保存されたセクションデータから、セクション番号及び最後のセクション番号を抽出するセクション番号抽出部と、 b)前記セクション番号を受信して、前記セクション番号のそれぞれのビット位置でのビット値1の累積回数を表す値である1累積値、及びビット値0の累積回数を表す値である0累積値を計算する累積値計算部と、 c)前記0累積値及び1累積値に基づいて、前記マスクの生成如何を決定する超過如何判断部と、 d)マスクの生成が決定された場合、前記セクション番号で前記0累積値または前記1累積値が超過したビット位置に基づいて、マスクを生成するセクション番号マスク生成部と、を備えること (data group, first data group) を特徴とする請求項1に記載の装置。

JP2006020310A
CLAIM 37
請求項1に記載の方法をコンピュ (processing data) ータで実行させるためのプログラム (corresponding different intermediate data) を記録したコンピュータで読み取り可能な記録媒体。

US8190610B2
CLAIM 2
. The method of claim 1 , wherein : the at least one output data group (えること) is a plurality of output data groups .
JP2006020310A
CLAIM 5
前記装置は、 a)前記デジタル放送受信器内で既に保存されたセクションデータから、セクション番号及び最後のセクション番号を抽出するセクション番号抽出部と、 b)前記セクション番号を受信して、前記セクション番号のそれぞれのビット位置でのビット値1の累積回数を表す値である1累積値、及びビット値0の累積回数を表す値である0累積値を計算する累積値計算部と、 c)前記0累積値及び1累積値に基づいて、前記マスクの生成如何を決定する超過如何判断部と、 d)マスクの生成が決定された場合、前記セクション番号で前記0累積値または前記1累積値が超過したビット位置に基づいて、マスクを生成するセクション番号マスク生成部と、を備えること (data group, first data group) を特徴とする請求項1に記載の装置。

US8190610B2
CLAIM 3
. The method of claim 1 , wherein : corresponding intermediate data for a data group (えること) being identifiable to the data group includes the corresponding intermediate data being stored such that metadata associated with a manner of storage is identifiable to the data group .
JP2006020310A
CLAIM 5
前記装置は、 a)前記デジタル放送受信器内で既に保存されたセクションデータから、セクション番号及び最後のセクション番号を抽出するセクション番号抽出部と、 b)前記セクション番号を受信して、前記セクション番号のそれぞれのビット位置でのビット値1の累積回数を表す値である1累積値、及びビット値0の累積回数を表す値である0累積値を計算する累積値計算部と、 c)前記0累積値及び1累積値に基づいて、前記マスクの生成如何を決定する超過如何判断部と、 d)マスクの生成が決定された場合、前記セクション番号で前記0累積値または前記1累積値が超過したビット位置に基づいて、マスクを生成するセクション番号マスク生成部と、を備えること (data group, first data group) を特徴とする請求項1に記載の装置。

US8190610B2
CLAIM 4
. The method of claim 1 , wherein : corresponding intermediate data for a data group (えること) being identifiable to the data group includes the corresponding intermediate data being stored in association with data that is identifiable to the data group .
JP2006020310A
CLAIM 5
前記装置は、 a)前記デジタル放送受信器内で既に保存されたセクションデータから、セクション番号及び最後のセクション番号を抽出するセクション番号抽出部と、 b)前記セクション番号を受信して、前記セクション番号のそれぞれのビット位置でのビット値1の累積回数を表す値である1累積値、及びビット値0の累積回数を表す値である0累積値を計算する累積値計算部と、 c)前記0累積値及び1累積値に基づいて、前記マスクの生成如何を決定する超過如何判断部と、 d)マスクの生成が決定された場合、前記セクション番号で前記0累積値または前記1累積値が超過したビット位置に基づいて、マスクを生成するセクション番号マスク生成部と、を備えること (data group, first data group) を特徴とする請求項1に記載の装置。

US8190610B2
CLAIM 5
. The method of claim 1 , wherein : processing the intermediate data for each data group (えること) in a manner that is defined to correspond to that data group includes , for each data group , employing an iterator that corresponds to that data group .
JP2006020310A
CLAIM 5
前記装置は、 a)前記デジタル放送受信器内で既に保存されたセクションデータから、セクション番号及び最後のセクション番号を抽出するセクション番号抽出部と、 b)前記セクション番号を受信して、前記セクション番号のそれぞれのビット位置でのビット値1の累積回数を表す値である1累積値、及びビット値0の累積回数を表す値である0累積値を計算する累積値計算部と、 c)前記0累積値及び1累積値に基づいて、前記マスクの生成如何を決定する超過如何判断部と、 d)マスクの生成が決定された場合、前記セクション番号で前記0累積値または前記1累積値が超過したビット位置に基づいて、マスクを生成するセクション番号マスク生成部と、を備えること (data group, first data group) を特徴とする請求項1に記載の装置。

US8190610B2
CLAIM 11
. The method of claim 10 , wherein : processing the intermediate data for each data group (えること) in a manner that is defined to correspond to that data group includes , for each data group , employing an iterator that corresponds to that data group , wherein the iterator includes providing the associated metadata to the processing of the reducing step .
JP2006020310A
CLAIM 5
前記装置は、 a)前記デジタル放送受信器内で既に保存されたセクションデータから、セクション番号及び最後のセクション番号を抽出するセクション番号抽出部と、 b)前記セクション番号を受信して、前記セクション番号のそれぞれのビット位置でのビット値1の累積回数を表す値である1累積値、及びビット値0の累積回数を表す値である0累積値を計算する累積値計算部と、 c)前記0累積値及び1累積値に基づいて、前記マスクの生成如何を決定する超過如何判断部と、 d)マスクの生成が決定された場合、前記セクション番号で前記0累積値または前記1累積値が超過したビット位置に基づいて、マスクを生成するセクション番号マスク生成部と、を備えること (data group, first data group) を特徴とする請求項1に記載の装置。

US8190610B2
CLAIM 12
. The method of claim 5 , wherein : the intermediate data includes a plurality of grouped sets of key/value pairs ;

the reducing step is carried out by a plurality of reducers ;

for at least one of the reducers , the iterator corresponding to a particular data group (えること) , for that reducer , operates according to a different key of a different schema than the iterator corresponding to another particular data group , for that reducer .
JP2006020310A
CLAIM 5
前記装置は、 a)前記デジタル放送受信器内で既に保存されたセクションデータから、セクション番号及び最後のセクション番号を抽出するセクション番号抽出部と、 b)前記セクション番号を受信して、前記セクション番号のそれぞれのビット位置でのビット値1の累積回数を表す値である1累積値、及びビット値0の累積回数を表す値である0累積値を計算する累積値計算部と、 c)前記0累積値及び1累積値に基づいて、前記マスクの生成如何を決定する超過如何判断部と、 d)マスクの生成が決定された場合、前記セクション番号で前記0累積値または前記1累積値が超過したビット位置に基づいて、マスクを生成するセクション番号マスク生成部と、を備えること (data group, first data group) を特徴とする請求項1に記載の装置。

US8190610B2
CLAIM 13
. The method of claim 1 , wherein : the intermediate data processing step of the reducing step further comprises processing data (コンピュ) that is not intermediate data .
JP2006020310A
CLAIM 37
請求項1に記載の方法をコンピュ (processing data) ータで実行させるためのプログラムを記録したコンピュータで読み取り可能な記録媒体。

US8190610B2
CLAIM 17
. A computer system including a plurality of computing devices , the computer system configured to process data of a data set , wherein the data set comprises a plurality of data groups , the computer system comprises at least one processor and memory that are operable to perform the following operations : partitioning the data of each one of the data groups into a plurality of data partitions that each have a plurality of key-value pairs and providing each data partition to a selected one of a plurality of mapping functions that are each user-configurable to independently output a plurality of lists of values for each of a set of keys found in such map function' ;

s corresponding data partition to form corresponding intermediate data for that data group (えること) and identifiable to that data group , wherein the data of a first data group has a different schema than the data of a second data group and the data of the first data group is mapped differently than the data of the second data group so that different lists of values are output for the corresponding different intermediate data (のプログラム) , wherein the different schema and corresponding different intermediate data have a key in common ;

and reduce the intermediate data for the data groups to at least one output data group , including processing the intermediate data for each data group in a manner that is defined to correspond to that data group so as to result in a merging of the corresponding different intermediate data based on the key in common .
JP2006020310A
CLAIM 5
前記装置は、 a)前記デジタル放送受信器内で既に保存されたセクションデータから、セクション番号及び最後のセクション番号を抽出するセクション番号抽出部と、 b)前記セクション番号を受信して、前記セクション番号のそれぞれのビット位置でのビット値1の累積回数を表す値である1累積値、及びビット値0の累積回数を表す値である0累積値を計算する累積値計算部と、 c)前記0累積値及び1累積値に基づいて、前記マスクの生成如何を決定する超過如何判断部と、 d)マスクの生成が決定された場合、前記セクション番号で前記0累積値または前記1累積値が超過したビット位置に基づいて、マスクを生成するセクション番号マスク生成部と、を備えること (data group, first data group) を特徴とする請求項1に記載の装置。

JP2006020310A
CLAIM 37
請求項1に記載の方法をコンピュータで実行させるためのプログラム (corresponding different intermediate data) を記録したコンピュータで読み取り可能な記録媒体。

US8190610B2
CLAIM 18
. The computer system of claim 17 , wherein : the at least one output data group (えること) is a plurality of output data groups .
JP2006020310A
CLAIM 5
前記装置は、 a)前記デジタル放送受信器内で既に保存されたセクションデータから、セクション番号及び最後のセクション番号を抽出するセクション番号抽出部と、 b)前記セクション番号を受信して、前記セクション番号のそれぞれのビット位置でのビット値1の累積回数を表す値である1累積値、及びビット値0の累積回数を表す値である0累積値を計算する累積値計算部と、 c)前記0累積値及び1累積値に基づいて、前記マスクの生成如何を決定する超過如何判断部と、 d)マスクの生成が決定された場合、前記セクション番号で前記0累積値または前記1累積値が超過したビット位置に基づいて、マスクを生成するセクション番号マスク生成部と、を備えること (data group, first data group) を特徴とする請求項1に記載の装置。

US8190610B2
CLAIM 19
. The computer system of claim 17 , wherein : corresponding intermediate data for a data group (えること) being identifiable to the data group includes the corresponding intermediate data being stored such that metadata associated with the manner of storage is identifiable to the data group .
JP2006020310A
CLAIM 5
前記装置は、 a)前記デジタル放送受信器内で既に保存されたセクションデータから、セクション番号及び最後のセクション番号を抽出するセクション番号抽出部と、 b)前記セクション番号を受信して、前記セクション番号のそれぞれのビット位置でのビット値1の累積回数を表す値である1累積値、及びビット値0の累積回数を表す値である0累積値を計算する累積値計算部と、 c)前記0累積値及び1累積値に基づいて、前記マスクの生成如何を決定する超過如何判断部と、 d)マスクの生成が決定された場合、前記セクション番号で前記0累積値または前記1累積値が超過したビット位置に基づいて、マスクを生成するセクション番号マスク生成部と、を備えること (data group, first data group) を特徴とする請求項1に記載の装置。

US8190610B2
CLAIM 20
. The computer system of claim 19 , wherein : corresponding intermediate data for a data group (えること) being identifiable to the data group includes the computer system being configured to store the corresponding intermediate data in association with data that is identifiable to the data group .
JP2006020310A
CLAIM 5
前記装置は、 a)前記デジタル放送受信器内で既に保存されたセクションデータから、セクション番号及び最後のセクション番号を抽出するセクション番号抽出部と、 b)前記セクション番号を受信して、前記セクション番号のそれぞれのビット位置でのビット値1の累積回数を表す値である1累積値、及びビット値0の累積回数を表す値である0累積値を計算する累積値計算部と、 c)前記0累積値及び1累積値に基づいて、前記マスクの生成如何を決定する超過如何判断部と、 d)マスクの生成が決定された場合、前記セクション番号で前記0累積値または前記1累積値が超過したビット位置に基づいて、マスクを生成するセクション番号マスク生成部と、を備えること (data group, first data group) を特徴とする請求項1に記載の装置。

US8190610B2
CLAIM 21
. The computer system of claim 17 , wherein : the at least one processor and memory being configured to process the intermediate data for each data group (えること) in a manner that is defined to correspond to that data group includes , for each data group , the at least one processor and memory being configured to employ an iterator that corresponds to that data group .
JP2006020310A
CLAIM 5
前記装置は、 a)前記デジタル放送受信器内で既に保存されたセクションデータから、セクション番号及び最後のセクション番号を抽出するセクション番号抽出部と、 b)前記セクション番号を受信して、前記セクション番号のそれぞれのビット位置でのビット値1の累積回数を表す値である1累積値、及びビット値0の累積回数を表す値である0累積値を計算する累積値計算部と、 c)前記0累積値及び1累積値に基づいて、前記マスクの生成如何を決定する超過如何判断部と、 d)マスクの生成が決定された場合、前記セクション番号で前記0累積値または前記1累積値が超過したビット位置に基づいて、マスクを生成するセクション番号マスク生成部と、を備えること (data group, first data group) を特徴とする請求項1に記載の装置。

US8190610B2
CLAIM 22
. The computer system of claim 21 , wherein : the intermediate data includes a plurality of grouped sets of key/value pairs ;

the at least one processor and memory are further operable to reduce the intermediate data via a plurality of reducers ;

for at least one of the reducers , the iterator corresponding to a particular data group (えること) , for that reducer , is configured to operate according to a different key of a different schema than the iterator corresponding to another particular data group , for that reducer .
JP2006020310A
CLAIM 5
前記装置は、 a)前記デジタル放送受信器内で既に保存されたセクションデータから、セクション番号及び最後のセクション番号を抽出するセクション番号抽出部と、 b)前記セクション番号を受信して、前記セクション番号のそれぞれのビット位置でのビット値1の累積回数を表す値である1累積値、及びビット値0の累積回数を表す値である0累積値を計算する累積値計算部と、 c)前記0累積値及び1累積値に基づいて、前記マスクの生成如何を決定する超過如何判断部と、 d)マスクの生成が決定された場合、前記セクション番号で前記0累積値または前記1累積値が超過したビット位置に基づいて、マスクを生成するセクション番号マスク生成部と、を備えること (data group, first data group) を特徴とする請求項1に記載の装置。

US8190610B2
CLAIM 28
. The computer system of claim 27 , wherein : the at least one processor and memory being further operable to process the intermediate data for each data group (えること) in a manner that is defined to correspond to that data group includes , for each data group , the at least one processor and memory being further operable to employ an iterator that corresponds to that data group , wherein the iterator is configured to provide the associated metadata to the reducing .
JP2006020310A
CLAIM 5
前記装置は、 a)前記デジタル放送受信器内で既に保存されたセクションデータから、セクション番号及び最後のセクション番号を抽出するセクション番号抽出部と、 b)前記セクション番号を受信して、前記セクション番号のそれぞれのビット位置でのビット値1の累積回数を表す値である1累積値、及びビット値0の累積回数を表す値である0累積値を計算する累積値計算部と、 c)前記0累積値及び1累積値に基づいて、前記マスクの生成如何を決定する超過如何判断部と、 d)マスクの生成が決定された場合、前記セクション番号で前記0累積値または前記1累積値が超過したビット位置に基づいて、マスクを生成するセクション番号マスク生成部と、を備えること (data group, first data group) を特徴とする請求項1に記載の装置。

US8190610B2
CLAIM 29
. The computer system of claim 17 , wherein : the intermediate data processing of the reducing further comprises processing data (コンピュ) that is not intermediate data .
JP2006020310A
CLAIM 37
請求項1に記載の方法をコンピュ (processing data) ータで実行させるためのプログラムを記録したコンピュータで読み取り可能な記録媒体。

US8190610B2
CLAIM 33
. A map-reduce method of processing data (コンピュ) from a plurality of groups having different schema over a computer system , the method comprising : for a first data set having a plurality of first key-value pairs , wherein such first data set belongs to a first data group (えること) and the first key-value pairs have a first schema , partitioning the first data set into a plurality of data partitions and providing each data partition to a selected one of a plurality of mapping functions that are each user-configurable to independently output a plurality of lists of values for each of a set of keys found in such map function' ;

s corresponding data partition to form a first intermediate data set having a first set of resulting key-value pairs ;

for a second data set having a plurality of second key-value pairs , wherein such second data set belongs to a second data group and the second key-value pairs have a second schema , partitioning the first data set into a plurality of data partitions and providing each data partition to a selected one of a plurality of mapping functions to form a second intermediate data set having a second set of resulting key-value pairs differing from the first set of resulting key-value pairs , wherein the first schema differs from the second schema , wherein the first and second schema and the first and second set of resulting key-value pairs have a key in common ;

and reducing the first and the second intermediate data set together so as to form an output data set , wherein the reducing is accomplished by iterating on at least one key from the first and second intermediate data set , and the output data set has a different schema than the first and second schema , wherein the mapping and reducing operations are performed by a distributed system .
JP2006020310A
CLAIM 5
前記装置は、 a)前記デジタル放送受信器内で既に保存されたセクションデータから、セクション番号及び最後のセクション番号を抽出するセクション番号抽出部と、 b)前記セクション番号を受信して、前記セクション番号のそれぞれのビット位置でのビット値1の累積回数を表す値である1累積値、及びビット値0の累積回数を表す値である0累積値を計算する累積値計算部と、 c)前記0累積値及び1累積値に基づいて、前記マスクの生成如何を決定する超過如何判断部と、 d)マスクの生成が決定された場合、前記セクション番号で前記0累積値または前記1累積値が超過したビット位置に基づいて、マスクを生成するセクション番号マスク生成部と、を備えること (data group, first data group) を特徴とする請求項1に記載の装置。

JP2006020310A
CLAIM 37
請求項1に記載の方法をコンピュ (processing data) ータで実行させるためのプログラムを記録したコンピュータで読み取り可能な記録媒体。

US8190610B2
CLAIM 40
. A computer system including a plurality of computing devices , the computer system configured to process data of a data set , wherein the data set comprises a plurality of data groups , the computer system comprises at least one processor and memory that are operable to perform the following operations : for a first data set having a plurality of first key-value pairs , wherein such first data set belongs to a first data group (えること) and the first key-value pairs have a first schema , partitioning the first data set into a plurality of data partitions and providing each data partition to a selected one of a plurality of mapping functions that are each user-configurable to independently output a plurality of lists of values for each of a set of keys found in such map function' ;

s corresponding data partition to form a first intermediate data set having a first set of resulting key-value pairs ;

for a second data set having a plurality of second key-value pairs , wherein such second data set belongs to a second data group and the second key-value pairs have a second schema , partitioning the first data set into a plurality of data partitions and providing each data partition to a selected one of a plurality of mapping functions to form a second intermediate data set having a second set of resulting key-value pairs differing from the first set of resulting key-value pairs , wherein the first schema differs from the second schema ;

and reducing the first and the second intermediate data set together so as to form an output data set , wherein the reducing is accomplished by iterating on at least one key from the first and second intermediate data set , and the output data set has a different schema than the first and second schema , wherein the first and second schema and the first and second set of resulting key-value pairs have a key in common .
JP2006020310A
CLAIM 5
前記装置は、 a)前記デジタル放送受信器内で既に保存されたセクションデータから、セクション番号及び最後のセクション番号を抽出するセクション番号抽出部と、 b)前記セクション番号を受信して、前記セクション番号のそれぞれのビット位置でのビット値1の累積回数を表す値である1累積値、及びビット値0の累積回数を表す値である0累積値を計算する累積値計算部と、 c)前記0累積値及び1累積値に基づいて、前記マスクの生成如何を決定する超過如何判断部と、 d)マスクの生成が決定された場合、前記セクション番号で前記0累積値または前記1累積値が超過したビット位置に基づいて、マスクを生成するセクション番号マスク生成部と、を備えること (data group, first data group) を特徴とする請求項1に記載の装置。




US8190610B2

Filed: 2006-10-05     Issued: 2012-05-29

MapReduce for distributed database processing

(Original Assignee) Yahoo Inc     (Current Assignee) R2 Solutions LLC ; Altaba Inc

Ali Dasdan, Hung-Chih Yang, Ruey-Lung Hsiao
US20060218123A1

Filed: 2005-06-02     Issued: 2006-09-28

System and Methodology for Parallel Query Optimization Using Semantic-Based Partitioning

(Original Assignee) Sybase Inc     (Current Assignee) Sybase Inc

Sudipto Chowdhuri, Mihnea Andrei
US8190610B2
CLAIM 1
. A method of processing data (processing data) of a data set over a distributed system , wherein the data set comprises a plurality of data groups , the method comprising : partitioning the data of each one of the data groups into a plurality of data partitions (data partitions) that each have a plurality of key-value pairs and providing each data partition to a selected one of a plurality of mapping functions that are each user-configurable to independently output a plurality of lists of values for each of a set of keys found in such map function' ;

s corresponding data partition to form corresponding intermediate data for that data group and identifiable to that data group , wherein the data of a first data group has a different schema than the data of a second data (said memory) group and the data of the first data group is mapped differently than the data of the second data group so that different lists of values are output for the corresponding different intermediate data , wherein the different schema and corresponding different intermediate data have a key in common ;

and reducing the intermediate data for the data groups to at least one output data group , including processing the intermediate data for each data group in a manner that is defined to correspond to that data group , so as to result in a merging of the corresponding different intermediate data based on the key in common , wherein the mapping and reducing operations are performed by a distributed system .
US20060218123A1
CLAIM 5
. The method of claim 3 , wherein said plurality of partitions includes selected ones of data partitions (data partitions) and index partitions .

US20060218123A1
CLAIM 9
. The method of claim 1 , wherein said adding step includes cloning an operator in a subplan into a plurality of operators for processing data (processing data) in parallel .

US20060218123A1
CLAIM 37
. In a database system comprising a database storing data in database tables , a method for improving query performance comprising : receiving a query specifying a join of two or more database tables ;
as data is retrieved from the database during processing of the query , partitioning said data into separate memory buffers ;
and processing said query in parallel by concurrently processing said data in said memory (second data) buffers .

US8190610B2
CLAIM 8
. The method of claim 7 , wherein : in the partitioning step (partitioning step) , all of the key/value pairs of the intermediate data are provided to all of the partitions .
US20060218123A1
CLAIM 38
. The method of claim 37 , wherein said partitioning step (partitioning step) includes dividing the data into a plurality of data streams during processing of the query .

US8190610B2
CLAIM 13
. The method of claim 1 , wherein : the intermediate data processing step of the reducing step further comprises processing data (processing data) that is not intermediate data .
US20060218123A1
CLAIM 9
. The method of claim 1 , wherein said adding step includes cloning an operator in a subplan into a plurality of operators for processing data (processing data) in parallel .

US8190610B2
CLAIM 17
. A computer system including a plurality of computing devices , the computer system configured to process data of a data set , wherein the data set comprises a plurality of data groups , the computer system comprises at least one processor and memory that are operable to perform the following operations : partitioning the data of each one of the data groups into a plurality of data partitions (data partitions) that each have a plurality of key-value pairs and providing each data partition to a selected one of a plurality of mapping functions that are each user-configurable to independently output a plurality of lists of values for each of a set of keys found in such map function' ;

s corresponding data partition to form corresponding intermediate data for that data group and identifiable to that data group , wherein the data of a first data group has a different schema than the data of a second data (said memory) group and the data of the first data group is mapped differently than the data of the second data group so that different lists of values are output for the corresponding different intermediate data , wherein the different schema and corresponding different intermediate data have a key in common ;

and reduce the intermediate data for the data groups to at least one output data group , including processing the intermediate data for each data group in a manner that is defined to correspond to that data group so as to result in a merging of the corresponding different intermediate data based on the key in common .
US20060218123A1
CLAIM 5
. The method of claim 3 , wherein said plurality of partitions includes selected ones of data partitions (data partitions) and index partitions .

US20060218123A1
CLAIM 37
. In a database system comprising a database storing data in database tables , a method for improving query performance comprising : receiving a query specifying a join of two or more database tables ;
as data is retrieved from the database during processing of the query , partitioning said data into separate memory buffers ;
and processing said query in parallel by concurrently processing said data in said memory (second data) buffers .

US8190610B2
CLAIM 29
. The computer system of claim 17 , wherein : the intermediate data processing of the reducing further comprises processing data (processing data) that is not intermediate data .
US20060218123A1
CLAIM 9
. The method of claim 1 , wherein said adding step includes cloning an operator in a subplan into a plurality of operators for processing data (processing data) in parallel .

US8190610B2
CLAIM 33
. A map-reduce method of processing data (processing data) from a plurality of groups having different schema over a computer system , the method comprising : for a first data set having a plurality of first key-value pairs , wherein such first data set belongs to a first data group and the first key-value pairs have a first schema , partitioning the first data set into a plurality of data partitions (data partitions) and providing each data partition to a selected one of a plurality of mapping functions that are each user-configurable to independently output a plurality of lists of values for each of a set of keys found in such map function' ;

s corresponding data partition to form a first intermediate data set having a first set of resulting key-value pairs ;

for a second data (said memory) set having a plurality of second key-value pairs , wherein such second data set belongs to a second data group and the second key-value pairs have a second schema , partitioning the first data set into a plurality of data partitions and providing each data partition to a selected one of a plurality of mapping functions to form a second intermediate data set having a second set (including one) of resulting key-value pairs differing from the first set of resulting key-value pairs , wherein the first schema differs from the second schema , wherein the first and second schema and the first and second set of resulting key-value pairs have a key in common ;

and reducing the first and the second intermediate data set together so as to form an output data set , wherein the reducing is accomplished by iterating on at least one key from the first and second intermediate data set , and the output data set has a different schema than the first and second schema , wherein the mapping and reducing operations are performed by a distributed system .
US20060218123A1
CLAIM 1
. In a database system comprising a database storing data in database tables , a method for improving query performance by dynamically partitioning said data , the method comprising : generating a plurality of subplans for obtaining data requested by the query , each subplan including one (second set) or more operators for performing relational operations ;
determining if partitioning of data is potentially useful for performing a given relational operation ;
adding operators for partitioning data and performing the given relational operation in parallel to at least some of said plurality of subplans if partitioning of data is determined to be potentially useful ;
and building a plan for execution of the query based , at least in part , upon selecting subplans having favorable execution costs .

US20060218123A1
CLAIM 5
. The method of claim 3 , wherein said plurality of partitions includes selected ones of data partitions (data partitions) and index partitions .

US20060218123A1
CLAIM 9
. The method of claim 1 , wherein said adding step includes cloning an operator in a subplan into a plurality of operators for processing data (processing data) in parallel .

US20060218123A1
CLAIM 37
. In a database system comprising a database storing data in database tables , a method for improving query performance comprising : receiving a query specifying a join of two or more database tables ;
as data is retrieved from the database during processing of the query , partitioning said data into separate memory buffers ;
and processing said query in parallel by concurrently processing said data in said memory (second data) buffers .

US8190610B2
CLAIM 40
. A computer system including a plurality of computing devices , the computer system configured to process data of a data set , wherein the data set comprises a plurality of data groups , the computer system comprises at least one processor and memory that are operable to perform the following operations : for a first data set having a plurality of first key-value pairs , wherein such first data set belongs to a first data group and the first key-value pairs have a first schema , partitioning the first data set into a plurality of data partitions (data partitions) and providing each data partition to a selected one of a plurality of mapping functions that are each user-configurable to independently output a plurality of lists of values for each of a set of keys found in such map function' ;

s corresponding data partition to form a first intermediate data set having a first set of resulting key-value pairs ;

for a second data (said memory) set having a plurality of second key-value pairs , wherein such second data set belongs to a second data group and the second key-value pairs have a second schema , partitioning the first data set into a plurality of data partitions and providing each data partition to a selected one of a plurality of mapping functions to form a second intermediate data set having a second set (including one) of resulting key-value pairs differing from the first set of resulting key-value pairs , wherein the first schema differs from the second schema ;

and reducing the first and the second intermediate data set together so as to form an output data set , wherein the reducing is accomplished by iterating on at least one key from the first and second intermediate data set , and the output data set has a different schema than the first and second schema , wherein the first and second schema and the first and second set of resulting key-value pairs have a key in common .
US20060218123A1
CLAIM 1
. In a database system comprising a database storing data in database tables , a method for improving query performance by dynamically partitioning said data , the method comprising : generating a plurality of subplans for obtaining data requested by the query , each subplan including one (second set) or more operators for performing relational operations ;
determining if partitioning of data is potentially useful for performing a given relational operation ;
adding operators for partitioning data and performing the given relational operation in parallel to at least some of said plurality of subplans if partitioning of data is determined to be potentially useful ;
and building a plan for execution of the query based , at least in part , upon selecting subplans having favorable execution costs .

US20060218123A1
CLAIM 5
. The method of claim 3 , wherein said plurality of partitions includes selected ones of data partitions (data partitions) and index partitions .

US20060218123A1
CLAIM 37
. In a database system comprising a database storing data in database tables , a method for improving query performance comprising : receiving a query specifying a join of two or more database tables ;
as data is retrieved from the database during processing of the query , partitioning said data into separate memory buffers ;
and processing said query in parallel by concurrently processing said data in said memory (second data) buffers .




US8190610B2

Filed: 2006-10-05     Issued: 2012-05-29

MapReduce for distributed database processing

(Original Assignee) Yahoo Inc     (Current Assignee) R2 Solutions LLC ; Altaba Inc

Ali Dasdan, Hung-Chih Yang, Ruey-Lung Hsiao
US20060004851A1

Filed: 2005-05-20     Issued: 2006-01-05

Object process graph relational database interface

(Original Assignee) GraphLogic Inc     (Current Assignee) GraphLogic Inc

Steven Gold, David Baker
US8190610B2
CLAIM 1
. A method of processing data of a data set over a distributed system , wherein the data set comprises a plurality of data groups , the method comprising : partitioning the data of each one of the data groups into a plurality of data partitions that each have a plurality of key-value pairs and providing each data partition to a selected one of a plurality of mapping functions that are each user-configurable to independently output a plurality of lists of values for each of a set of keys found in such map function' ;

s corresponding data partition to form corresponding intermediate data for that data group and identifiable to that data group , wherein the data of a first data (first data) group has a different schema (data object) than the data of a second data (second data) group and the data of the first data group is mapped differently than the data of the second data group so that different lists of values are output for the corresponding different intermediate data , wherein the different schema and corresponding different intermediate data have a key in common ;

and reducing the intermediate data for the data groups to at least one output data group , including processing the intermediate data for each data group in a manner that is defined to correspond to that data group , so as to result in a merging of the corresponding different intermediate data based on the key in common , wherein the mapping and reducing operations are performed by a distributed system .
US20060004851A1
CLAIM 6
. The system of claim 1 , further comprising : a graph data object (different schema) definer for creating at least one description corresponding to said at least one schema ;
and an object process graph system for importing said at least one description .

US20060004851A1
CLAIM 11
. The system of claim 7 , wherein said plurality of tables includes a first table , a second table , and a third table , said first table being in a many-to-many relationship with said third table , said second table connecting said first and third tables , said first table corresponding to an array of first composite data nodes , said first composite data nodes being associated with a first data (first data) array node , said first data array node having an array of third composite data nodes corresponding to said third table , said array of third composite data nodes being associated with a second data (second data) array node , and said second data array node being associated with said array of first composite data nodes .

US8190610B2
CLAIM 6
. The method of claim 1 , wherein : the intermediate data includes a plurality of grouped sets of key/value pairs ;

the reducing step (software product) is carried out by a plurality of reducers ;

the method further comprises : partitioning the intermediate data into a plurality of partitions , each key/value pair of the intermediate data being provided to a separate one of the partitions ;

and providing the intermediate data of each partition to a separate one of the reducers .
US20060004851A1
CLAIM 12
. A software product (reducing step) stored as instructions on a storage medium for performing a method of providing an object process graph relational database interface , the method comprising : creating at least one schema that is storable in a relational database management system corresponding to at least one object process graph ;
creating a plurality of descriptions of changes to said at least one object process graph , said descriptions being grouped by transaction ;
and selectively storing said descriptions in said relational database management system .

US8190610B2
CLAIM 7
. The method of claim 1 , wherein : the intermediate data includes a plurality of grouped sets of key/value pairs ;

the reducing step (software product) is carried out by a plurality of reducers ;

the method further comprises : partitioning the intermediate data into a plurality of partitions , at least some of the key/value pairs of the intermediate data being provided to more than one of the partitions ;

and providing the intermediate data of each partition to a separate one of the reducers .
US20060004851A1
CLAIM 12
. A software product (reducing step) stored as instructions on a storage medium for performing a method of providing an object process graph relational database interface , the method comprising : creating at least one schema that is storable in a relational database management system corresponding to at least one object process graph ;
creating a plurality of descriptions of changes to said at least one object process graph , said descriptions being grouped by transaction ;
and selectively storing said descriptions in said relational database management system .

US8190610B2
CLAIM 10
. The method of claim 9 , wherein : the reducing step (software product) includes processing the metadata .
US20060004851A1
CLAIM 12
. A software product (reducing step) stored as instructions on a storage medium for performing a method of providing an object process graph relational database interface , the method comprising : creating at least one schema that is storable in a relational database management system corresponding to at least one object process graph ;
creating a plurality of descriptions of changes to said at least one object process graph , said descriptions being grouped by transaction ;
and selectively storing said descriptions in said relational database management system .

US8190610B2
CLAIM 11
. The method of claim 10 , wherein : processing the intermediate data for each data group in a manner that is defined to correspond to that data group includes , for each data group , employing an iterator that corresponds to that data group , wherein the iterator includes providing the associated metadata to the processing of the reducing step (software product) .
US20060004851A1
CLAIM 12
. A software product (reducing step) stored as instructions on a storage medium for performing a method of providing an object process graph relational database interface , the method comprising : creating at least one schema that is storable in a relational database management system corresponding to at least one object process graph ;
creating a plurality of descriptions of changes to said at least one object process graph , said descriptions being grouped by transaction ;
and selectively storing said descriptions in said relational database management system .

US8190610B2
CLAIM 12
. The method of claim 5 , wherein : the intermediate data includes a plurality of grouped sets of key/value pairs ;

the reducing step (software product) is carried out by a plurality of reducers ;

for at least one of the reducers , the iterator corresponding to a particular data group , for that reducer , operates according to a different key of a different schema (data object) than the iterator corresponding to another particular data group , for that reducer .
US20060004851A1
CLAIM 6
. The system of claim 1 , further comprising : a graph data object (different schema) definer for creating at least one description corresponding to said at least one schema ;
and an object process graph system for importing said at least one description .

US20060004851A1
CLAIM 12
. A software product (reducing step) stored as instructions on a storage medium for performing a method of providing an object process graph relational database interface , the method comprising : creating at least one schema that is storable in a relational database management system corresponding to at least one object process graph ;
creating a plurality of descriptions of changes to said at least one object process graph , said descriptions being grouped by transaction ;
and selectively storing said descriptions in said relational database management system .

US8190610B2
CLAIM 13
. The method of claim 1 , wherein : the intermediate data processing step of the reducing step (software product) further comprises processing data that is not intermediate data .
US20060004851A1
CLAIM 12
. A software product (reducing step) stored as instructions on a storage medium for performing a method of providing an object process graph relational database interface , the method comprising : creating at least one schema that is storable in a relational database management system corresponding to at least one object process graph ;
creating a plurality of descriptions of changes to said at least one object process graph , said descriptions being grouped by transaction ;
and selectively storing said descriptions in said relational database management system .

US8190610B2
CLAIM 14
. The method of claim 13 , wherein : the reducing step (software product) is carried out by a plurality of reducers ;

and the data that is not intermediate data , for a particular reducer , includes data that is associated with another reducer .
US20060004851A1
CLAIM 12
. A software product (reducing step) stored as instructions on a storage medium for performing a method of providing an object process graph relational database interface , the method comprising : creating at least one schema that is storable in a relational database management system corresponding to at least one object process graph ;
creating a plurality of descriptions of changes to said at least one object process graph , said descriptions being grouped by transaction ;
and selectively storing said descriptions in said relational database management system .

US8190610B2
CLAIM 15
. The method of claim 13 , wherein : the reducing step (software product) is carried out by a plurality of reducers ;

and the data that is not intermediate data , for a particular reducer , includes data that is associated with that reducer .
US20060004851A1
CLAIM 12
. A software product (reducing step) stored as instructions on a storage medium for performing a method of providing an object process graph relational database interface , the method comprising : creating at least one schema that is storable in a relational database management system corresponding to at least one object process graph ;
creating a plurality of descriptions of changes to said at least one object process graph , said descriptions being grouped by transaction ;
and selectively storing said descriptions in said relational database management system .

US8190610B2
CLAIM 16
. The method of claim 1 wherein the reducing step (software product) includes relating the data among the plurality of data groups .
US20060004851A1
CLAIM 12
. A software product (reducing step) stored as instructions on a storage medium for performing a method of providing an object process graph relational database interface , the method comprising : creating at least one schema that is storable in a relational database management system corresponding to at least one object process graph ;
creating a plurality of descriptions of changes to said at least one object process graph , said descriptions being grouped by transaction ;
and selectively storing said descriptions in said relational database management system .

US8190610B2
CLAIM 17
. A computer system including a plurality of computing devices , the computer system configured to process data of a data set , wherein the data set comprises a plurality of data groups , the computer system comprises at least one processor and memory that are operable to perform the following operations : partitioning the data of each one of the data groups into a plurality of data partitions that each have a plurality of key-value pairs and providing each data partition to a selected one of a plurality of mapping functions that are each user-configurable to independently output a plurality of lists of values for each of a set of keys found in such map function' ;

s corresponding data partition to form corresponding intermediate data for that data group and identifiable to that data group , wherein the data of a first data (first data) group has a different schema (data object) than the data of a second data (second data) group and the data of the first data group is mapped differently than the data of the second data group so that different lists of values are output for the corresponding different intermediate data , wherein the different schema and corresponding different intermediate data have a key in common ;

and reduce the intermediate data for the data groups to at least one output data group , including processing the intermediate data for each data group in a manner that is defined to correspond to that data group so as to result in a merging of the corresponding different intermediate data based on the key in common .
US20060004851A1
CLAIM 6
. The system of claim 1 , further comprising : a graph data object (different schema) definer for creating at least one description corresponding to said at least one schema ;
and an object process graph system for importing said at least one description .

US20060004851A1
CLAIM 11
. The system of claim 7 , wherein said plurality of tables includes a first table , a second table , and a third table , said first table being in a many-to-many relationship with said third table , said second table connecting said first and third tables , said first table corresponding to an array of first composite data nodes , said first composite data nodes being associated with a first data (first data) array node , said first data array node having an array of third composite data nodes corresponding to said third table , said array of third composite data nodes being associated with a second data (second data) array node , and said second data array node being associated with said array of first composite data nodes .

US8190610B2
CLAIM 22
. The computer system of claim 21 , wherein : the intermediate data includes a plurality of grouped sets of key/value pairs ;

the at least one processor and memory are further operable to reduce the intermediate data via a plurality of reducers ;

for at least one of the reducers , the iterator corresponding to a particular data group , for that reducer , is configured to operate according to a different key of a different schema (data object) than the iterator corresponding to another particular data group , for that reducer .
US20060004851A1
CLAIM 6
. The system of claim 1 , further comprising : a graph data object (different schema) definer for creating at least one description corresponding to said at least one schema ;
and an object process graph system for importing said at least one description .

US8190610B2
CLAIM 33
. A map-reduce method of processing data from a plurality of groups having different schema (data object) over a computer system , the method comprising : for a first data (first data) set having a plurality of first key-value pairs , wherein such first data set belongs to a first data group and the first key-value pairs have a first schema , partitioning the first data set into a plurality of data partitions and providing each data partition to a selected one of a plurality of mapping functions that are each user-configurable to independently output a plurality of lists of values for each of a set of keys found in such map function' ;

s corresponding data partition to form a first intermediate data set having a first set of resulting key-value pairs ;

for a second data (second data) set having a plurality of second key-value pairs , wherein such second data set belongs to a second data group and the second key-value pairs have a second schema , partitioning the first data set into a plurality of data partitions and providing each data partition to a selected one of a plurality of mapping functions to form a second intermediate data set having a second set of resulting key-value pairs differing from the first set of resulting key-value pairs , wherein the first schema differs from the second schema , wherein the first and second schema and the first and second set of resulting key-value pairs have a key in common ;

and reducing the first and the second intermediate data set together so as to form an output data set , wherein the reducing is accomplished by iterating on at least one key from the first and second intermediate data set , and the output data set has a different schema than the first and second schema , wherein the mapping and reducing operations are performed by a distributed system .
US20060004851A1
CLAIM 6
. The system of claim 1 , further comprising : a graph data object (different schema) definer for creating at least one description corresponding to said at least one schema ;
and an object process graph system for importing said at least one description .

US20060004851A1
CLAIM 11
. The system of claim 7 , wherein said plurality of tables includes a first table , a second table , and a third table , said first table being in a many-to-many relationship with said third table , said second table connecting said first and third tables , said first table corresponding to an array of first composite data nodes , said first composite data nodes being associated with a first data (first data) array node , said first data array node having an array of third composite data nodes corresponding to said third table , said array of third composite data nodes being associated with a second data (second data) array node , and said second data array node being associated with said array of first composite data nodes .

US8190610B2
CLAIM 39
. The map-reduce method of claim 38 , wherein iterating includes providing the associated metadata to the processing of the reducing step (software product) .
US20060004851A1
CLAIM 12
. A software product (reducing step) stored as instructions on a storage medium for performing a method of providing an object process graph relational database interface , the method comprising : creating at least one schema that is storable in a relational database management system corresponding to at least one object process graph ;
creating a plurality of descriptions of changes to said at least one object process graph , said descriptions being grouped by transaction ;
and selectively storing said descriptions in said relational database management system .

US8190610B2
CLAIM 40
. A computer system including a plurality of computing devices , the computer system configured to process data of a data set , wherein the data set comprises a plurality of data groups , the computer system comprises at least one processor and memory that are operable to perform the following operations : for a first data (first data) set having a plurality of first key-value pairs , wherein such first data set belongs to a first data group and the first key-value pairs have a first schema , partitioning the first data set into a plurality of data partitions and providing each data partition to a selected one of a plurality of mapping functions that are each user-configurable to independently output a plurality of lists of values for each of a set of keys found in such map function' ;

s corresponding data partition to form a first intermediate data set having a first set of resulting key-value pairs ;

for a second data (second data) set having a plurality of second key-value pairs , wherein such second data set belongs to a second data group and the second key-value pairs have a second schema , partitioning the first data set into a plurality of data partitions and providing each data partition to a selected one of a plurality of mapping functions to form a second intermediate data set having a second set of resulting key-value pairs differing from the first set of resulting key-value pairs , wherein the first schema differs from the second schema ;

and reducing the first and the second intermediate data set together so as to form an output data set , wherein the reducing is accomplished by iterating on at least one key from the first and second intermediate data set , and the output data set has a different schema (data object) than the first and second schema , wherein the first and second schema and the first and second set of resulting key-value pairs have a key in common .
US20060004851A1
CLAIM 6
. The system of claim 1 , further comprising : a graph data object (different schema) definer for creating at least one description corresponding to said at least one schema ;
and an object process graph system for importing said at least one description .

US20060004851A1
CLAIM 11
. The system of claim 7 , wherein said plurality of tables includes a first table , a second table , and a third table , said first table being in a many-to-many relationship with said third table , said second table connecting said first and third tables , said first table corresponding to an array of first composite data nodes , said first composite data nodes being associated with a first data (first data) array node , said first data array node having an array of third composite data nodes corresponding to said third table , said array of third composite data nodes being associated with a second data (second data) array node , and said second data array node being associated with said array of first composite data nodes .

US8190610B2
CLAIM 46
. The computer system of claim 45 , wherein iterating includes providing the associated metadata to the processing of the reducing step (software product) .
US20060004851A1
CLAIM 12
. A software product (reducing step) stored as instructions on a storage medium for performing a method of providing an object process graph relational database interface , the method comprising : creating at least one schema that is storable in a relational database management system corresponding to at least one object process graph ;
creating a plurality of descriptions of changes to said at least one object process graph , said descriptions being grouped by transaction ;
and selectively storing said descriptions in said relational database management system .




US8190610B2

Filed: 2006-10-05     Issued: 2012-05-29

MapReduce for distributed database processing

(Original Assignee) Yahoo Inc     (Current Assignee) R2 Solutions LLC ; Altaba Inc

Ali Dasdan, Hung-Chih Yang, Ruey-Lung Hsiao
US20060036568A1

Filed: 2005-04-22     Issued: 2006-02-16

File system shell

(Original Assignee) Microsoft Corp     (Current Assignee) Microsoft Technology Licensing LLC

Jason Moore, Giampiero Sierra, Richard Banks, Lyon Wong, Relja Ivanovic, Paul Gusmorino, Tyler Beam, Timothy McKee, Jeffrey Belt, David De Vorchik, Chris Guzak, Aidan Low, Kenneth Tubbs, Colin Anthony, Sasanka Chalivendra, Marieke Watson, Gerald Joyce, Alex Wade, Benjamin Betz, Ahsan Kabir, Donna Andrews, Patrice Miner, Paul Cutsinger
US8190610B2
CLAIM 1
. A method of processing data of a data set over a distributed system , wherein the data set comprises a plurality of data groups , the method comprising : partitioning the data of each one of the data groups into a plurality of data partitions that each have a plurality of key-value pairs and providing each data partition to a selected one of a plurality of mapping functions that are each user-configurable to independently output a plurality of lists of values for each of a set of keys found in such map function' ;

s corresponding data partition to form corresponding intermediate data for that data group and identifiable to that data group , wherein the data of a first data (first data) group (functional modules) has a different schema than the data of a second data group and the data of the first data group is mapped differently than the data of the second data group so that different lists (corresponding item, said list) of values are output for the corresponding different intermediate data , wherein the different schema and corresponding different intermediate data have a key in common ;

and reducing the intermediate data for the data groups to at least one output data group , including processing the intermediate data for each data group in a manner that is defined to correspond to that data group , so as to result in a merging of the corresponding different intermediate data based on the key in common , wherein the mapping and reducing operations (user provides input) are performed by a distributed system .
US20060036568A1
CLAIM 9
. The file system shell browser of claim 8 , wherein said list (different lists) view slider comprises preset presentation styles including an iconic presentation style and a list presentation style .

US20060036568A1
CLAIM 14
. The file system shell browser of claim 1 , wherein , when a user provides input (reducing operations) focus to any of the data items in the first set of data items , the file system shell browser exposes in a commands module a set of commands corresponding to the data item having input focus .

US20060036568A1
CLAIM 25
. A user interface stored as computer executable instructions on one or more computer readable media , said user interface corresponding to a file system shell browser , and said user interface comprising : a primary view pane for displaying a plurality of data items corresponding to a presently selected virtual location ;
and three or more functional modules (first data group, first data set) displayed corresponding to each other , said functional modules selected from the set of : a page space control module , said page space control module providing a hierarchical tree of metadata properties and value , said tree navigable by a user to identify a selected metadata value , thereby causing corresponding item (different lists) s to be displayed in the primary view pane ;
a virtual address bar module identifying the virtual location of the plurality of data items displayed in the primary view pane ;
a list view slider module providing a selectably changeable display element to allow a user to select a presentation style of the plurality of data items in the primary view pane ;
a virtual folder builder module exposing functionality for a user to define a virtual folder scope comprising one or more explicitly included storage locations and one or more explicitly excluded storage locations ;
and a preview module for displaying metadata corresponding to a selected one of the plurality of data items displayed in the primary view pane , wherein the preview module exposes a user interface through which a user can edit at least a portion of the metadata corresponding to the selected one of the plurality of data items .

US20060036568A1
CLAIM 30
. The user interface of claim 25 , wherein the primary view pane presents a first data (first data) item of the plurality of data items in an iconic form indicating a number of further data items corresponding to the one data item .

US8190610B2
CLAIM 17
. A computer system including a plurality of computing devices , the computer system configured to process data of a data set , wherein the data set comprises a plurality of data groups , the computer system comprises at least one processor and memory that are operable to perform the following operations : partitioning the data of each one of the data groups into a plurality of data partitions that each have a plurality of key-value pairs and providing each data partition to a selected one of a plurality of mapping functions that are each user-configurable to independently output a plurality of lists of values for each of a set of keys found in such map function' ;

s corresponding data partition to form corresponding intermediate data for that data group and identifiable to that data group , wherein the data of a first data (first data) group (functional modules) has a different schema than the data of a second data group and the data of the first data group is mapped differently than the data of the second data group so that different lists (corresponding item, said list) of values are output for the corresponding different intermediate data , wherein the different schema and corresponding different intermediate data have a key in common ;

and reduce the intermediate data for the data groups to at least one output data group , including processing the intermediate data for each data group in a manner that is defined to correspond to that data group so as to result in a merging of the corresponding different intermediate data based on the key in common .
US20060036568A1
CLAIM 9
. The file system shell browser of claim 8 , wherein said list (different lists) view slider comprises preset presentation styles including an iconic presentation style and a list presentation style .

US20060036568A1
CLAIM 25
. A user interface stored as computer executable instructions on one or more computer readable media , said user interface corresponding to a file system shell browser , and said user interface comprising : a primary view pane for displaying a plurality of data items corresponding to a presently selected virtual location ;
and three or more functional modules (first data group, first data set) displayed corresponding to each other , said functional modules selected from the set of : a page space control module , said page space control module providing a hierarchical tree of metadata properties and value , said tree navigable by a user to identify a selected metadata value , thereby causing corresponding item (different lists) s to be displayed in the primary view pane ;
a virtual address bar module identifying the virtual location of the plurality of data items displayed in the primary view pane ;
a list view slider module providing a selectably changeable display element to allow a user to select a presentation style of the plurality of data items in the primary view pane ;
a virtual folder builder module exposing functionality for a user to define a virtual folder scope comprising one or more explicitly included storage locations and one or more explicitly excluded storage locations ;
and a preview module for displaying metadata corresponding to a selected one of the plurality of data items displayed in the primary view pane , wherein the preview module exposes a user interface through which a user can edit at least a portion of the metadata corresponding to the selected one of the plurality of data items .

US20060036568A1
CLAIM 30
. The user interface of claim 25 , wherein the primary view pane presents a first data (first data) item of the plurality of data items in an iconic form indicating a number of further data items corresponding to the one data item .

US8190610B2
CLAIM 33
. A map-reduce method of processing data from a plurality of groups having different schema over a computer system , the method comprising : for a first data (first data) set (functional modules) having a plurality of first key-value pairs , wherein such first data set belongs to a first data group (functional modules) and the first key-value pairs have a first schema , partitioning the first data set into a plurality of data partitions and providing each data partition to a selected one of a plurality of mapping functions that are each user-configurable to independently output a plurality of lists of values for each of a set of keys found in such map function' ;

s corresponding data partition to form a first intermediate data set having a first set of resulting key-value pairs ;

for a second data set having a plurality of second key-value pairs , wherein such second data set belongs to a second data group and the second key-value pairs have a second schema , partitioning the first data set into a plurality of data partitions and providing each data partition to a selected one of a plurality of mapping functions to form a second intermediate data set having a second set (second set) of resulting key-value pairs differing from the first set of resulting key-value pairs , wherein the first schema differs from the second schema , wherein the first and second schema and the first and second set of resulting key-value pairs have a key in common ;

and reducing the first and the second intermediate data set together so as to form an output data set , wherein the reducing is accomplished by iterating on at least one key from the first and second intermediate data set , and the output data set has a different schema than the first and second schema , wherein the mapping and reducing operations (user provides input) are performed by a distributed system .
US20060036568A1
CLAIM 4
. The file system shell browser of claim 3 , wherein when the user selects one of the hierarchically equivalent metadata values , the primary view pane presents a second display of a second set (second set) of data items corresponding to the one of the hierarchically equivalent metadata values .

US20060036568A1
CLAIM 14
. The file system shell browser of claim 1 , wherein , when a user provides input (reducing operations) focus to any of the data items in the first set of data items , the file system shell browser exposes in a commands module a set of commands corresponding to the data item having input focus .

US20060036568A1
CLAIM 25
. A user interface stored as computer executable instructions on one or more computer readable media , said user interface corresponding to a file system shell browser , and said user interface comprising : a primary view pane for displaying a plurality of data items corresponding to a presently selected virtual location ;
and three or more functional modules (first data group, first data set) displayed corresponding to each other , said functional modules selected from the set of : a page space control module , said page space control module providing a hierarchical tree of metadata properties and value , said tree navigable by a user to identify a selected metadata value , thereby causing corresponding items to be displayed in the primary view pane ;
a virtual address bar module identifying the virtual location of the plurality of data items displayed in the primary view pane ;
a list view slider module providing a selectably changeable display element to allow a user to select a presentation style of the plurality of data items in the primary view pane ;
a virtual folder builder module exposing functionality for a user to define a virtual folder scope comprising one or more explicitly included storage locations and one or more explicitly excluded storage locations ;
and a preview module for displaying metadata corresponding to a selected one of the plurality of data items displayed in the primary view pane , wherein the preview module exposes a user interface through which a user can edit at least a portion of the metadata corresponding to the selected one of the plurality of data items .

US20060036568A1
CLAIM 30
. The user interface of claim 25 , wherein the primary view pane presents a first data (first data) item of the plurality of data items in an iconic form indicating a number of further data items corresponding to the one data item .

US8190610B2
CLAIM 40
. A computer system including a plurality of computing devices , the computer system configured to process data of a data set , wherein the data set comprises a plurality of data groups , the computer system comprises at least one processor and memory that are operable to perform the following operations : for a first data (first data) set (functional modules) having a plurality of first key-value pairs , wherein such first data set belongs to a first data group (functional modules) and the first key-value pairs have a first schema , partitioning the first data set into a plurality of data partitions and providing each data partition to a selected one of a plurality of mapping functions that are each user-configurable to independently output a plurality of lists of values for each of a set of keys found in such map function' ;

s corresponding data partition to form a first intermediate data set having a first set of resulting key-value pairs ;

for a second data set having a plurality of second key-value pairs , wherein such second data set belongs to a second data group and the second key-value pairs have a second schema , partitioning the first data set into a plurality of data partitions and providing each data partition to a selected one of a plurality of mapping functions to form a second intermediate data set having a second set (second set) of resulting key-value pairs differing from the first set of resulting key-value pairs , wherein the first schema differs from the second schema ;

and reducing the first and the second intermediate data set together so as to form an output data set , wherein the reducing is accomplished by iterating on at least one key from the first and second intermediate data set , and the output data set has a different schema than the first and second schema , wherein the first and second schema and the first and second set of resulting key-value pairs have a key in common .
US20060036568A1
CLAIM 4
. The file system shell browser of claim 3 , wherein when the user selects one of the hierarchically equivalent metadata values , the primary view pane presents a second display of a second set (second set) of data items corresponding to the one of the hierarchically equivalent metadata values .

US20060036568A1
CLAIM 25
. A user interface stored as computer executable instructions on one or more computer readable media , said user interface corresponding to a file system shell browser , and said user interface comprising : a primary view pane for displaying a plurality of data items corresponding to a presently selected virtual location ;
and three or more functional modules (first data group, first data set) displayed corresponding to each other , said functional modules selected from the set of : a page space control module , said page space control module providing a hierarchical tree of metadata properties and value , said tree navigable by a user to identify a selected metadata value , thereby causing corresponding items to be displayed in the primary view pane ;
a virtual address bar module identifying the virtual location of the plurality of data items displayed in the primary view pane ;
a list view slider module providing a selectably changeable display element to allow a user to select a presentation style of the plurality of data items in the primary view pane ;
a virtual folder builder module exposing functionality for a user to define a virtual folder scope comprising one or more explicitly included storage locations and one or more explicitly excluded storage locations ;
and a preview module for displaying metadata corresponding to a selected one of the plurality of data items displayed in the primary view pane , wherein the preview module exposes a user interface through which a user can edit at least a portion of the metadata corresponding to the selected one of the plurality of data items .

US20060036568A1
CLAIM 30
. The user interface of claim 25 , wherein the primary view pane presents a first data (first data) item of the plurality of data items in an iconic form indicating a number of further data items corresponding to the one data item .




US8190610B2

Filed: 2006-10-05     Issued: 2012-05-29

MapReduce for distributed database processing

(Original Assignee) Yahoo Inc     (Current Assignee) R2 Solutions LLC ; Altaba Inc

Ali Dasdan, Hung-Chih Yang, Ruey-Lung Hsiao
JP2005353039A

Filed: 2005-04-18     Issued: 2005-12-22

データオーバーレイ、自己編成メタデータオーバーレイおよびアプリケーションレベルマルチキャスティング

(Original Assignee) Microsoft Corp; マイクロソフト コーポレーション     

Yu Chen, Shiding Lin, Xing Xie, Zheng Zhang, リン シディング, シエ シン, チャン チェン, チェン ユー
US8190610B2
CLAIM 1
. A method of processing data of a data set (アップ) over a distributed system , wherein the data set comprises a plurality of data groups (なるグループ) , the method comprising : partitioning the data of each one of the data groups into a plurality of data partitions that each have a plurality of key-value pairs and providing each data partition to a selected one of a plurality of mapping functions that are each user-configurable to independently output a plurality of lists of values for each of a set of keys found in such map function' ;

s corresponding data partition to form corresponding intermediate data for that data group and identifiable to that data group , wherein the data of a first data group has a different schema than the data of a second data group and the data of the first data group is mapped differently than the data of the second data group so that different lists of values are output for the corresponding different intermediate data (有する単一) , wherein the different schema and corresponding different intermediate data have a key in common ;

and reducing the intermediate data for the data groups to at least one output data group , including processing the intermediate data for each data group in a manner that is defined to correspond to that data group , so as to result in a merging of the corresponding different intermediate data based on the key in common , wherein the mapping and reducing operations are performed by a distributed system .
JP2005353039A
CLAIM 1
データオーバーレイをデータ構造として、ピアツーピアシステムのための分散ハッシュテーブル(DHT)内に含まれた論理空間の上に構築するステップであって、前記論理空間は、関連付けられた複数のDHTゾーンを有する複数のDHTノードを含むステップと、 前記データオーバーレイ内で、各前記DHTノードに関連付けられた1つまたは複数のツリーノードをそれぞれ含む、複数のレベルを有するツリーのトポロジを構築するステップであって、 前記ツリーの第1のレベルは、前記DHTの前記論理空間の全体の範囲に対応し、複数のツリーノードゾーンに論理的に分割される、単一のツリーノードゾーンを有する単一 (corresponding different intermediate data) のツリーノードを含み、複数の前記ツリーノードゾーンはそれぞれ、 前記ツリーの各レベルの前記ツリーノードと、 前記DHTの前記論理空間の部分とに対応し、 各前記ツリーノードは、その各ツリーノードゾーンに関連付けられたキーを識別するキーメンバを含むステップと、 複数のマシンを前記DHTの前記論理空間にマップするステップであって、 各マシンは、1つまたは複数の前記ツリーノードゾーンに対応し、 各マシンはその代表ノードとして、それに対応する前記1つまたは複数のツリーノードゾーンから、最大サイズのツリーノードゾーンに対応する前記ツリーノードを選択し、 各前記代表ノードはその親ノードとして、より大きいサイズを有する隣接した前記ツリーノードゾーンのための前記代表ノードである、もう1つの前記代表ノードを選択するステップと を備えることを特徴とする方法。

JP2005353039A
CLAIM 8
各キーを計算する前記ステップは、前記マシンにより、ルックアップ (first set, second set, data set, first data set, second data set, second intermediate data set, output data set, intermediate data set) を前記DHT内で使用して情報を得るステップをさらに備え、前記マシンは前記情報を、対応する前記代表ノードの前記キーにより使用して、前記代表ノードに対応する前記マシンとの通信を確立することを特徴とする請求項7に記載の方法。

JP2005353039A
CLAIM 11
前記最適化関数は、ネットワーク座標、帯域幅ボトルネック、最大レイテンシ、およびレイテンシの変化からなるグループ (data groups, output data groups) から選択された基準に基づき、それにより、最もリソースを必要とするタスクは、前記ピアツーピアシステム内で最もリソースを利用可能なマシンによって実行されることを特徴とする請求項10に記載の方法。

US8190610B2
CLAIM 2
. The method of claim 1 , wherein : the at least one output data group is a plurality of output data groups (なるグループ) .
JP2005353039A
CLAIM 11
前記最適化関数は、ネットワーク座標、帯域幅ボトルネック、最大レイテンシ、およびレイテンシの変化からなるグループ (data groups, output data groups) から選択された基準に基づき、それにより、最もリソースを必要とするタスクは、前記ピアツーピアシステム内で最もリソースを利用可能なマシンによって実行されることを特徴とする請求項10に記載の方法。

US8190610B2
CLAIM 16
. The method of claim 1 wherein the reducing step includes relating the data among the plurality of data groups (なるグループ) .
JP2005353039A
CLAIM 11
前記最適化関数は、ネットワーク座標、帯域幅ボトルネック、最大レイテンシ、およびレイテンシの変化からなるグループ (data groups, output data groups) から選択された基準に基づき、それにより、最もリソースを必要とするタスクは、前記ピアツーピアシステム内で最もリソースを利用可能なマシンによって実行されることを特徴とする請求項10に記載の方法。

US8190610B2
CLAIM 17
. A computer system (行うこと) including a plurality of computing devices , the computer system configured to process data of a data set (アップ) , wherein the data set comprises a plurality of data groups (なるグループ) , the computer system comprises at least one processor and memory that are operable to perform the following operations : partitioning the data of each one of the data groups into a plurality of data partitions that each have a plurality of key-value pairs and providing each data partition to a selected one of a plurality of mapping functions that are each user-configurable to independently output a plurality of lists of values for each of a set of keys found in such map function' ;

s corresponding data partition to form corresponding intermediate data for that data group and identifiable to that data group , wherein the data of a first data group has a different schema than the data of a second data group and the data of the first data group is mapped differently than the data of the second data group so that different lists of values are output for the corresponding different intermediate data (有する単一) , wherein the different schema and corresponding different intermediate data have a key in common ;

and reduce the intermediate data for the data groups to at least one output data group , including processing the intermediate data for each data group in a manner that is defined to correspond to that data group so as to result in a merging of the corresponding different intermediate data based on the key in common .
JP2005353039A
CLAIM 1
データオーバーレイをデータ構造として、ピアツーピアシステムのための分散ハッシュテーブル(DHT)内に含まれた論理空間の上に構築するステップであって、前記論理空間は、関連付けられた複数のDHTゾーンを有する複数のDHTノードを含むステップと、 前記データオーバーレイ内で、各前記DHTノードに関連付けられた1つまたは複数のツリーノードをそれぞれ含む、複数のレベルを有するツリーのトポロジを構築するステップであって、 前記ツリーの第1のレベルは、前記DHTの前記論理空間の全体の範囲に対応し、複数のツリーノードゾーンに論理的に分割される、単一のツリーノードゾーンを有する単一 (corresponding different intermediate data) のツリーノードを含み、複数の前記ツリーノードゾーンはそれぞれ、 前記ツリーの各レベルの前記ツリーノードと、 前記DHTの前記論理空間の部分とに対応し、 各前記ツリーノードは、その各ツリーノードゾーンに関連付けられたキーを識別するキーメンバを含むステップと、 複数のマシンを前記DHTの前記論理空間にマップするステップであって、 各マシンは、1つまたは複数の前記ツリーノードゾーンに対応し、 各マシンはその代表ノードとして、それに対応する前記1つまたは複数のツリーノードゾーンから、最大サイズのツリーノードゾーンに対応する前記ツリーノードを選択し、 各前記代表ノードはその親ノードとして、より大きいサイズを有する隣接した前記ツリーノードゾーンのための前記代表ノードである、もう1つの前記代表ノードを選択するステップと を備えることを特徴とする方法。

JP2005353039A
CLAIM 8
各キーを計算する前記ステップは、前記マシンにより、ルックアップ (first set, second set, data set, first data set, second data set, second intermediate data set, output data set, intermediate data set) を前記DHT内で使用して情報を得るステップをさらに備え、前記マシンは前記情報を、対応する前記代表ノードの前記キーにより使用して、前記代表ノードに対応する前記マシンとの通信を確立することを特徴とする請求項7に記載の方法。

JP2005353039A
CLAIM 9
各前記マシンで、ハートビート送信を、隣接する前記ツリーノードゾーン内の各前記マシンから受信するステップと、 いずれかの前記ハートビート送信がタイムリーに受信されない場合、前記隣接する前記ツリーノードゾーン内の対応する前記マシンの不在を、 前記DHTの提供を繰り返すステップと、 前記データオーバーレイを前記データ構造として、前記DHTの前記論理空間の上に構築する前記ステップを繰り返すステップと、 マルチレベルツリーを、再構築されたデータオーバーレイ内で構築するステップを繰り返すステップと、 前記複数のマシンを前記DHTの前記論理空間にマップする前記ステップを繰り返すステップとを行うこと (computer system) によって、計上するステップと をさらに備えることを特徴とする請求項1に記載の方法。

JP2005353039A
CLAIM 11
前記最適化関数は、ネットワーク座標、帯域幅ボトルネック、最大レイテンシ、およびレイテンシの変化からなるグループ (data groups, output data groups) から選択された基準に基づき、それにより、最もリソースを必要とするタスクは、前記ピアツーピアシステム内で最もリソースを利用可能なマシンによって実行されることを特徴とする請求項10に記載の方法。

US8190610B2
CLAIM 18
. The computer system (行うこと) of claim 17 , wherein : the at least one output data group is a plurality of output data groups (なるグループ) .
JP2005353039A
CLAIM 9
各前記マシンで、ハートビート送信を、隣接する前記ツリーノードゾーン内の各前記マシンから受信するステップと、 いずれかの前記ハートビート送信がタイムリーに受信されない場合、前記隣接する前記ツリーノードゾーン内の対応する前記マシンの不在を、 前記DHTの提供を繰り返すステップと、 前記データオーバーレイを前記データ構造として、前記DHTの前記論理空間の上に構築する前記ステップを繰り返すステップと、 マルチレベルツリーを、再構築されたデータオーバーレイ内で構築するステップを繰り返すステップと、 前記複数のマシンを前記DHTの前記論理空間にマップする前記ステップを繰り返すステップとを行うこと (computer system) によって、計上するステップと をさらに備えることを特徴とする請求項1に記載の方法。

JP2005353039A
CLAIM 11
前記最適化関数は、ネットワーク座標、帯域幅ボトルネック、最大レイテンシ、およびレイテンシの変化からなるグループ (data groups, output data groups) から選択された基準に基づき、それにより、最もリソースを必要とするタスクは、前記ピアツーピアシステム内で最もリソースを利用可能なマシンによって実行されることを特徴とする請求項10に記載の方法。

US8190610B2
CLAIM 19
. The computer system (行うこと) of claim 17 , wherein : corresponding intermediate data for a data group being identifiable to the data group includes the corresponding intermediate data being stored such that metadata associated with the manner of storage is identifiable to the data group .
JP2005353039A
CLAIM 9
各前記マシンで、ハートビート送信を、隣接する前記ツリーノードゾーン内の各前記マシンから受信するステップと、 いずれかの前記ハートビート送信がタイムリーに受信されない場合、前記隣接する前記ツリーノードゾーン内の対応する前記マシンの不在を、 前記DHTの提供を繰り返すステップと、 前記データオーバーレイを前記データ構造として、前記DHTの前記論理空間の上に構築する前記ステップを繰り返すステップと、 マルチレベルツリーを、再構築されたデータオーバーレイ内で構築するステップを繰り返すステップと、 前記複数のマシンを前記DHTの前記論理空間にマップする前記ステップを繰り返すステップとを行うこと (computer system) によって、計上するステップと をさらに備えることを特徴とする請求項1に記載の方法。

US8190610B2
CLAIM 20
. The computer system (行うこと) of claim 19 , wherein : corresponding intermediate data for a data group being identifiable to the data group includes the computer system being configured to store the corresponding intermediate data in association with data that is identifiable to the data group .
JP2005353039A
CLAIM 9
各前記マシンで、ハートビート送信を、隣接する前記ツリーノードゾーン内の各前記マシンから受信するステップと、 いずれかの前記ハートビート送信がタイムリーに受信されない場合、前記隣接する前記ツリーノードゾーン内の対応する前記マシンの不在を、 前記DHTの提供を繰り返すステップと、 前記データオーバーレイを前記データ構造として、前記DHTの前記論理空間の上に構築する前記ステップを繰り返すステップと、 マルチレベルツリーを、再構築されたデータオーバーレイ内で構築するステップを繰り返すステップと、 前記複数のマシンを前記DHTの前記論理空間にマップする前記ステップを繰り返すステップとを行うこと (computer system) によって、計上するステップと をさらに備えることを特徴とする請求項1に記載の方法。

US8190610B2
CLAIM 21
. The computer system (行うこと) of claim 17 , wherein : the at least one processor and memory being configured to process the intermediate data for each data group in a manner that is defined to correspond to that data group includes , for each data group , the at least one processor and memory being configured to employ an iterator that corresponds to that data group .
JP2005353039A
CLAIM 9
各前記マシンで、ハートビート送信を、隣接する前記ツリーノードゾーン内の各前記マシンから受信するステップと、 いずれかの前記ハートビート送信がタイムリーに受信されない場合、前記隣接する前記ツリーノードゾーン内の対応する前記マシンの不在を、 前記DHTの提供を繰り返すステップと、 前記データオーバーレイを前記データ構造として、前記DHTの前記論理空間の上に構築する前記ステップを繰り返すステップと、 マルチレベルツリーを、再構築されたデータオーバーレイ内で構築するステップを繰り返すステップと、 前記複数のマシンを前記DHTの前記論理空間にマップする前記ステップを繰り返すステップとを行うこと (computer system) によって、計上するステップと をさらに備えることを特徴とする請求項1に記載の方法。

US8190610B2
CLAIM 22
. The computer system (行うこと) of claim 21 , wherein : the intermediate data includes a plurality of grouped sets of key/value pairs ;

the at least one processor and memory are further operable to reduce the intermediate data via a plurality of reducers ;

for at least one of the reducers , the iterator corresponding to a particular data group , for that reducer , is configured to operate according to a different key of a different schema than the iterator corresponding to another particular data group , for that reducer .
JP2005353039A
CLAIM 9
各前記マシンで、ハートビート送信を、隣接する前記ツリーノードゾーン内の各前記マシンから受信するステップと、 いずれかの前記ハートビート送信がタイムリーに受信されない場合、前記隣接する前記ツリーノードゾーン内の対応する前記マシンの不在を、 前記DHTの提供を繰り返すステップと、 前記データオーバーレイを前記データ構造として、前記DHTの前記論理空間の上に構築する前記ステップを繰り返すステップと、 マルチレベルツリーを、再構築されたデータオーバーレイ内で構築するステップを繰り返すステップと、 前記複数のマシンを前記DHTの前記論理空間にマップする前記ステップを繰り返すステップとを行うこと (computer system) によって、計上するステップと をさらに備えることを特徴とする請求項1に記載の方法。

US8190610B2
CLAIM 23
. The computer system (行うこと) of claim 17 , wherein : the intermediate data includes a plurality of grouped sets of key/value pairs ;

the at least one processor and memory are further operable to reduce the intermediate data via a plurality of reducers ;

and the at least one processor and memory are further operable to : partition the intermediate data into a plurality of partitions , to provide each key/value pair of the intermediate data to a separate one of the partitions ;

and provide the intermediate data of each partition to a separate one of the reducers .
JP2005353039A
CLAIM 9
各前記マシンで、ハートビート送信を、隣接する前記ツリーノードゾーン内の各前記マシンから受信するステップと、 いずれかの前記ハートビート送信がタイムリーに受信されない場合、前記隣接する前記ツリーノードゾーン内の対応する前記マシンの不在を、 前記DHTの提供を繰り返すステップと、 前記データオーバーレイを前記データ構造として、前記DHTの前記論理空間の上に構築する前記ステップを繰り返すステップと、 マルチレベルツリーを、再構築されたデータオーバーレイ内で構築するステップを繰り返すステップと、 前記複数のマシンを前記DHTの前記論理空間にマップする前記ステップを繰り返すステップとを行うこと (computer system) によって、計上するステップと をさらに備えることを特徴とする請求項1に記載の方法。

US8190610B2
CLAIM 24
. The computer system (行うこと) of claim 17 , wherein : the intermediate data includes a plurality of grouped sets of key/value pairs ;

the at least one processor and memory are further operable to reduce the intermediate data via a plurality of reducers ;

and the at least one processor and memory are further operable to : partition the intermediate data into a plurality of partitions , to provide at least some of the key/value pairs of the intermediate data to more than one of the partitions ;

and provide the intermediate data of each partition to a separate one of the reducers .
JP2005353039A
CLAIM 9
各前記マシンで、ハートビート送信を、隣接する前記ツリーノードゾーン内の各前記マシンから受信するステップと、 いずれかの前記ハートビート送信がタイムリーに受信されない場合、前記隣接する前記ツリーノードゾーン内の対応する前記マシンの不在を、 前記DHTの提供を繰り返すステップと、 前記データオーバーレイを前記データ構造として、前記DHTの前記論理空間の上に構築する前記ステップを繰り返すステップと、 マルチレベルツリーを、再構築されたデータオーバーレイ内で構築するステップを繰り返すステップと、 前記複数のマシンを前記DHTの前記論理空間にマップする前記ステップを繰り返すステップとを行うこと (computer system) によって、計上するステップと をさらに備えることを特徴とする請求項1に記載の方法。

US8190610B2
CLAIM 25
. The computer system (行うこと) of claim 24 , wherein : the at least one processor and memory are further operable to , in partitioning , to provide all of the key/value pairs of the intermediate data to all of the partitions .
JP2005353039A
CLAIM 9
各前記マシンで、ハートビート送信を、隣接する前記ツリーノードゾーン内の各前記マシンから受信するステップと、 いずれかの前記ハートビート送信がタイムリーに受信されない場合、前記隣接する前記ツリーノードゾーン内の対応する前記マシンの不在を、 前記DHTの提供を繰り返すステップと、 前記データオーバーレイを前記データ構造として、前記DHTの前記論理空間の上に構築する前記ステップを繰り返すステップと、 マルチレベルツリーを、再構築されたデータオーバーレイ内で構築するステップを繰り返すステップと、 前記複数のマシンを前記DHTの前記論理空間にマップする前記ステップを繰り返すステップとを行うこと (computer system) によって、計上するステップと をさらに備えることを特徴とする請求項1に記載の方法。

US8190610B2
CLAIM 26
. The computer system (行うこと) of claim 24 , wherein : at least some of the reducers include a sort , group-by-key and combine task ;

the at least one processor and memory are further operable to generate and provide metadata for at least some of the mapping , partitioning , combining , grouping and sorting .
JP2005353039A
CLAIM 9
各前記マシンで、ハートビート送信を、隣接する前記ツリーノードゾーン内の各前記マシンから受信するステップと、 いずれかの前記ハートビート送信がタイムリーに受信されない場合、前記隣接する前記ツリーノードゾーン内の対応する前記マシンの不在を、 前記DHTの提供を繰り返すステップと、 前記データオーバーレイを前記データ構造として、前記DHTの前記論理空間の上に構築する前記ステップを繰り返すステップと、 マルチレベルツリーを、再構築されたデータオーバーレイ内で構築するステップを繰り返すステップと、 前記複数のマシンを前記DHTの前記論理空間にマップする前記ステップを繰り返すステップとを行うこと (computer system) によって、計上するステップと をさらに備えることを特徴とする請求項1に記載の方法。

US8190610B2
CLAIM 27
. The computer system (行うこと) of claim 26 , wherein : the reducing includes processing the metadata .
JP2005353039A
CLAIM 9
各前記マシンで、ハートビート送信を、隣接する前記ツリーノードゾーン内の各前記マシンから受信するステップと、 いずれかの前記ハートビート送信がタイムリーに受信されない場合、前記隣接する前記ツリーノードゾーン内の対応する前記マシンの不在を、 前記DHTの提供を繰り返すステップと、 前記データオーバーレイを前記データ構造として、前記DHTの前記論理空間の上に構築する前記ステップを繰り返すステップと、 マルチレベルツリーを、再構築されたデータオーバーレイ内で構築するステップを繰り返すステップと、 前記複数のマシンを前記DHTの前記論理空間にマップする前記ステップを繰り返すステップとを行うこと (computer system) によって、計上するステップと をさらに備えることを特徴とする請求項1に記載の方法。

US8190610B2
CLAIM 28
. The computer system (行うこと) of claim 27 , wherein : the at least one processor and memory being further operable to process the intermediate data for each data group in a manner that is defined to correspond to that data group includes , for each data group , the at least one processor and memory being further operable to employ an iterator that corresponds to that data group , wherein the iterator is configured to provide the associated metadata to the reducing .
JP2005353039A
CLAIM 9
各前記マシンで、ハートビート送信を、隣接する前記ツリーノードゾーン内の各前記マシンから受信するステップと、 いずれかの前記ハートビート送信がタイムリーに受信されない場合、前記隣接する前記ツリーノードゾーン内の対応する前記マシンの不在を、 前記DHTの提供を繰り返すステップと、 前記データオーバーレイを前記データ構造として、前記DHTの前記論理空間の上に構築する前記ステップを繰り返すステップと、 マルチレベルツリーを、再構築されたデータオーバーレイ内で構築するステップを繰り返すステップと、 前記複数のマシンを前記DHTの前記論理空間にマップする前記ステップを繰り返すステップとを行うこと (computer system) によって、計上するステップと をさらに備えることを特徴とする請求項1に記載の方法。

US8190610B2
CLAIM 29
. The computer system (行うこと) of claim 17 , wherein : the intermediate data processing of the reducing further comprises processing data that is not intermediate data .
JP2005353039A
CLAIM 9
各前記マシンで、ハートビート送信を、隣接する前記ツリーノードゾーン内の各前記マシンから受信するステップと、 いずれかの前記ハートビート送信がタイムリーに受信されない場合、前記隣接する前記ツリーノードゾーン内の対応する前記マシンの不在を、 前記DHTの提供を繰り返すステップと、 前記データオーバーレイを前記データ構造として、前記DHTの前記論理空間の上に構築する前記ステップを繰り返すステップと、 マルチレベルツリーを、再構築されたデータオーバーレイ内で構築するステップを繰り返すステップと、 前記複数のマシンを前記DHTの前記論理空間にマップする前記ステップを繰り返すステップとを行うこと (computer system) によって、計上するステップと をさらに備えることを特徴とする請求項1に記載の方法。

US8190610B2
CLAIM 30
. The computer system (行うこと) of claim 29 , wherein : the at least one processor and memory are further operable to reduce the intermediate data via a plurality of reducers ;

and the data that is not intermediate data , for a particular reducer , includes data that is associated with another reducer .
JP2005353039A
CLAIM 9
各前記マシンで、ハートビート送信を、隣接する前記ツリーノードゾーン内の各前記マシンから受信するステップと、 いずれかの前記ハートビート送信がタイムリーに受信されない場合、前記隣接する前記ツリーノードゾーン内の対応する前記マシンの不在を、 前記DHTの提供を繰り返すステップと、 前記データオーバーレイを前記データ構造として、前記DHTの前記論理空間の上に構築する前記ステップを繰り返すステップと、 マルチレベルツリーを、再構築されたデータオーバーレイ内で構築するステップを繰り返すステップと、 前記複数のマシンを前記DHTの前記論理空間にマップする前記ステップを繰り返すステップとを行うこと (computer system) によって、計上するステップと をさらに備えることを特徴とする請求項1に記載の方法。

US8190610B2
CLAIM 31
. The computer system (行うこと) of claim 29 , wherein : the at least one processor and memory are further operable to reduce the intermediate data via a plurality of reducers ;

and the data that is not intermediate data , for a particular reducer , includes data that is associated with that reducer .
JP2005353039A
CLAIM 9
各前記マシンで、ハートビート送信を、隣接する前記ツリーノードゾーン内の各前記マシンから受信するステップと、 いずれかの前記ハートビート送信がタイムリーに受信されない場合、前記隣接する前記ツリーノードゾーン内の対応する前記マシンの不在を、 前記DHTの提供を繰り返すステップと、 前記データオーバーレイを前記データ構造として、前記DHTの前記論理空間の上に構築する前記ステップを繰り返すステップと、 マルチレベルツリーを、再構築されたデータオーバーレイ内で構築するステップを繰り返すステップと、 前記複数のマシンを前記DHTの前記論理空間にマップする前記ステップを繰り返すステップとを行うこと (computer system) によって、計上するステップと をさらに備えることを特徴とする請求項1に記載の方法。

US8190610B2
CLAIM 32
. The computer system (行うこと) of claim 17 , wherein the at least one processor and memory are further operable to reduce by relating the data among the plurality of data groups (なるグループ) .
JP2005353039A
CLAIM 9
各前記マシンで、ハートビート送信を、隣接する前記ツリーノードゾーン内の各前記マシンから受信するステップと、 いずれかの前記ハートビート送信がタイムリーに受信されない場合、前記隣接する前記ツリーノードゾーン内の対応する前記マシンの不在を、 前記DHTの提供を繰り返すステップと、 前記データオーバーレイを前記データ構造として、前記DHTの前記論理空間の上に構築する前記ステップを繰り返すステップと、 マルチレベルツリーを、再構築されたデータオーバーレイ内で構築するステップを繰り返すステップと、 前記複数のマシンを前記DHTの前記論理空間にマップする前記ステップを繰り返すステップとを行うこと (computer system) によって、計上するステップと をさらに備えることを特徴とする請求項1に記載の方法。

JP2005353039A
CLAIM 11
前記最適化関数は、ネットワーク座標、帯域幅ボトルネック、最大レイテンシ、およびレイテンシの変化からなるグループ (data groups, output data groups) から選択された基準に基づき、それにより、最もリソースを必要とするタスクは、前記ピアツーピアシステム内で最もリソースを利用可能なマシンによって実行されることを特徴とする請求項10に記載の方法。

US8190610B2
CLAIM 33
. A map-reduce method of processing data from a plurality of groups having different schema over a computer system (行うこと) , the method comprising : for a first data set (アップ) having a plurality of first key-value pairs , wherein such first data set belongs to a first data group and the first key-value pairs have a first schema , partitioning the first data set into a plurality of data partitions and providing each data partition to a selected one of a plurality of mapping functions that are each user-configurable to independently output a plurality of lists of values for each of a set of keys found in such map function' ;

s corresponding data partition to form a first intermediate data set having a first set (アップ) of resulting key-value pairs ;

for a second data set having a plurality of second key-value pairs , wherein such second data set belongs to a second data group and the second key-value pairs have a second schema , partitioning the first data set into a plurality of data partitions and providing each data partition to a selected one of a plurality of mapping functions to form a second intermediate data set having a second set (アップ) of resulting key-value pairs differing from the first set of resulting key-value pairs , wherein the first schema differs from the second schema , wherein the first and second schema and the first and second set of resulting key-value pairs have a key in common ;

and reducing the first and the second intermediate data set together so as to form an output data set , wherein the reducing is accomplished by iterating on at least one key from the first and second intermediate data set , and the output data set has a different schema than the first and second schema , wherein the mapping and reducing operations are performed by a distributed system .
JP2005353039A
CLAIM 8
各キーを計算する前記ステップは、前記マシンにより、ルックアップ (first set, second set, data set, first data set, second data set, second intermediate data set, output data set, intermediate data set) を前記DHT内で使用して情報を得るステップをさらに備え、前記マシンは前記情報を、対応する前記代表ノードの前記キーにより使用して、前記代表ノードに対応する前記マシンとの通信を確立することを特徴とする請求項7に記載の方法。

JP2005353039A
CLAIM 9
各前記マシンで、ハートビート送信を、隣接する前記ツリーノードゾーン内の各前記マシンから受信するステップと、 いずれかの前記ハートビート送信がタイムリーに受信されない場合、前記隣接する前記ツリーノードゾーン内の対応する前記マシンの不在を、 前記DHTの提供を繰り返すステップと、 前記データオーバーレイを前記データ構造として、前記DHTの前記論理空間の上に構築する前記ステップを繰り返すステップと、 マルチレベルツリーを、再構築されたデータオーバーレイ内で構築するステップを繰り返すステップと、 前記複数のマシンを前記DHTの前記論理空間にマップする前記ステップを繰り返すステップとを行うこと (computer system) によって、計上するステップと をさらに備えることを特徴とする請求項1に記載の方法。

US8190610B2
CLAIM 34
. The map-reduce method of claim 33 , wherein the reducing is accomplished by iterating on at least one key from each of the first and second intermediate data set (アップ) so that the output data set is a merging of a portion of the first and second intermediate data set .
JP2005353039A
CLAIM 8
各キーを計算する前記ステップは、前記マシンにより、ルックアップ (first set, second set, data set, first data set, second data set, second intermediate data set, output data set, intermediate data set) を前記DHT内で使用して情報を得るステップをさらに備え、前記マシンは前記情報を、対応する前記代表ノードの前記キーにより使用して、前記代表ノードに対応する前記マシンとの通信を確立することを特徴とする請求項7に記載の方法。

US8190610B2
CLAIM 35
. The map-reduce method of claim 33 , wherein the reducing is accomplished by a plurality of reducers , the method further comprises : partitioning the first or second intermediate data set (アップ) into a plurality of partitions , at least some of the key-value pairs of the first or second intermediate data being provided to more than one of the partitions ;

and providing the partitioned intermediate data set of each partition to a separate one of the reducers .
JP2005353039A
CLAIM 8
各キーを計算する前記ステップは、前記マシンにより、ルックアップ (first set, second set, data set, first data set, second data set, second intermediate data set, output data set, intermediate data set) を前記DHT内で使用して情報を得るステップをさらに備え、前記マシンは前記情報を、対応する前記代表ノードの前記キーにより使用して、前記代表ノードに対応する前記マシンとの通信を確立することを特徴とする請求項7に記載の方法。

US8190610B2
CLAIM 36
. The map-reduce method of claim 35 , wherein all of the key-value pairs of the first and second intermediate data set (アップ) are provided to all of the reducers .
JP2005353039A
CLAIM 8
各キーを計算する前記ステップは、前記マシンにより、ルックアップ (first set, second set, data set, first data set, second data set, second intermediate data set, output data set, intermediate data set) を前記DHT内で使用して情報を得るステップをさらに備え、前記マシンは前記情報を、対応する前記代表ノードの前記キーにより使用して、前記代表ノードに対応する前記マシンとの通信を確立することを特徴とする請求項7に記載の方法。

US8190610B2
CLAIM 40
. A computer system (行うこと) including a plurality of computing devices , the computer system configured to process data of a data set (アップ) , wherein the data set comprises a plurality of data groups (なるグループ) , the computer system comprises at least one processor and memory that are operable to perform the following operations : for a first data set having a plurality of first key-value pairs , wherein such first data set belongs to a first data group and the first key-value pairs have a first schema , partitioning the first data set into a plurality of data partitions and providing each data partition to a selected one of a plurality of mapping functions that are each user-configurable to independently output a plurality of lists of values for each of a set of keys found in such map function' ;

s corresponding data partition to form a first intermediate data set having a first set (アップ) of resulting key-value pairs ;

for a second data set having a plurality of second key-value pairs , wherein such second data set belongs to a second data group and the second key-value pairs have a second schema , partitioning the first data set into a plurality of data partitions and providing each data partition to a selected one of a plurality of mapping functions to form a second intermediate data set having a second set (アップ) of resulting key-value pairs differing from the first set of resulting key-value pairs , wherein the first schema differs from the second schema ;

and reducing the first and the second intermediate data set together so as to form an output data set , wherein the reducing is accomplished by iterating on at least one key from the first and second intermediate data set , and the output data set has a different schema than the first and second schema , wherein the first and second schema and the first and second set of resulting key-value pairs have a key in common .
JP2005353039A
CLAIM 8
各キーを計算する前記ステップは、前記マシンにより、ルックアップ (first set, second set, data set, first data set, second data set, second intermediate data set, output data set, intermediate data set) を前記DHT内で使用して情報を得るステップをさらに備え、前記マシンは前記情報を、対応する前記代表ノードの前記キーにより使用して、前記代表ノードに対応する前記マシンとの通信を確立することを特徴とする請求項7に記載の方法。

JP2005353039A
CLAIM 9
各前記マシンで、ハートビート送信を、隣接する前記ツリーノードゾーン内の各前記マシンから受信するステップと、 いずれかの前記ハートビート送信がタイムリーに受信されない場合、前記隣接する前記ツリーノードゾーン内の対応する前記マシンの不在を、 前記DHTの提供を繰り返すステップと、 前記データオーバーレイを前記データ構造として、前記DHTの前記論理空間の上に構築する前記ステップを繰り返すステップと、 マルチレベルツリーを、再構築されたデータオーバーレイ内で構築するステップを繰り返すステップと、 前記複数のマシンを前記DHTの前記論理空間にマップする前記ステップを繰り返すステップとを行うこと (computer system) によって、計上するステップと をさらに備えることを特徴とする請求項1に記載の方法。

JP2005353039A
CLAIM 11
前記最適化関数は、ネットワーク座標、帯域幅ボトルネック、最大レイテンシ、およびレイテンシの変化からなるグループ (data groups, output data groups) から選択された基準に基づき、それにより、最もリソースを必要とするタスクは、前記ピアツーピアシステム内で最もリソースを利用可能なマシンによって実行されることを特徴とする請求項10に記載の方法。

US8190610B2
CLAIM 41
. The computer system (行うこと) of claim 40 , wherein the reducing is accomplished by iterating on at least one key from each of the first and second intermediate data set (アップ) so that the output data set is a merging of a portion of the first and second intermediate data set .
JP2005353039A
CLAIM 8
各キーを計算する前記ステップは、前記マシンにより、ルックアップ (first set, second set, data set, first data set, second data set, second intermediate data set, output data set, intermediate data set) を前記DHT内で使用して情報を得るステップをさらに備え、前記マシンは前記情報を、対応する前記代表ノードの前記キーにより使用して、前記代表ノードに対応する前記マシンとの通信を確立することを特徴とする請求項7に記載の方法。

JP2005353039A
CLAIM 9
各前記マシンで、ハートビート送信を、隣接する前記ツリーノードゾーン内の各前記マシンから受信するステップと、 いずれかの前記ハートビート送信がタイムリーに受信されない場合、前記隣接する前記ツリーノードゾーン内の対応する前記マシンの不在を、 前記DHTの提供を繰り返すステップと、 前記データオーバーレイを前記データ構造として、前記DHTの前記論理空間の上に構築する前記ステップを繰り返すステップと、 マルチレベルツリーを、再構築されたデータオーバーレイ内で構築するステップを繰り返すステップと、 前記複数のマシンを前記DHTの前記論理空間にマップする前記ステップを繰り返すステップとを行うこと (computer system) によって、計上するステップと をさらに備えることを特徴とする請求項1に記載の方法。

US8190610B2
CLAIM 42
. The computer system (行うこと) of claim 40 , wherein the reducing is accomplished by a plurality of reducers , the d the at least one processor and memory further operable for : partitioning the first or second intermediate data set (アップ) into a plurality of partitions , at least some of the key-value pairs of the first or second intermediate data being provided to more than one of the partitions ;

and providing the partitioned intermediate data set of each partition to a separate one of the reducers .
JP2005353039A
CLAIM 8
各キーを計算する前記ステップは、前記マシンにより、ルックアップ (first set, second set, data set, first data set, second data set, second intermediate data set, output data set, intermediate data set) を前記DHT内で使用して情報を得るステップをさらに備え、前記マシンは前記情報を、対応する前記代表ノードの前記キーにより使用して、前記代表ノードに対応する前記マシンとの通信を確立することを特徴とする請求項7に記載の方法。

JP2005353039A
CLAIM 9
各前記マシンで、ハートビート送信を、隣接する前記ツリーノードゾーン内の各前記マシンから受信するステップと、 いずれかの前記ハートビート送信がタイムリーに受信されない場合、前記隣接する前記ツリーノードゾーン内の対応する前記マシンの不在を、 前記DHTの提供を繰り返すステップと、 前記データオーバーレイを前記データ構造として、前記DHTの前記論理空間の上に構築する前記ステップを繰り返すステップと、 マルチレベルツリーを、再構築されたデータオーバーレイ内で構築するステップを繰り返すステップと、 前記複数のマシンを前記DHTの前記論理空間にマップする前記ステップを繰り返すステップとを行うこと (computer system) によって、計上するステップと をさらに備えることを特徴とする請求項1に記載の方法。

US8190610B2
CLAIM 43
. The computer system (行うこと) of claim 42 , wherein all of the key-value pairs of the first and second intermediate data set (アップ) are provided to all of the reducers .
JP2005353039A
CLAIM 8
各キーを計算する前記ステップは、前記マシンにより、ルックアップ (first set, second set, data set, first data set, second data set, second intermediate data set, output data set, intermediate data set) を前記DHT内で使用して情報を得るステップをさらに備え、前記マシンは前記情報を、対応する前記代表ノードの前記キーにより使用して、前記代表ノードに対応する前記マシンとの通信を確立することを特徴とする請求項7に記載の方法。

JP2005353039A
CLAIM 9
各前記マシンで、ハートビート送信を、隣接する前記ツリーノードゾーン内の各前記マシンから受信するステップと、 いずれかの前記ハートビート送信がタイムリーに受信されない場合、前記隣接する前記ツリーノードゾーン内の対応する前記マシンの不在を、 前記DHTの提供を繰り返すステップと、 前記データオーバーレイを前記データ構造として、前記DHTの前記論理空間の上に構築する前記ステップを繰り返すステップと、 マルチレベルツリーを、再構築されたデータオーバーレイ内で構築するステップを繰り返すステップと、 前記複数のマシンを前記DHTの前記論理空間にマップする前記ステップを繰り返すステップとを行うこと (computer system) によって、計上するステップと をさらに備えることを特徴とする請求項1に記載の方法。

US8190610B2
CLAIM 44
. The computer system (行うこと) of claim 42 , wherein at least some of the reducers include a sort , group-by-key and combine task , the d the at least one processor and memory are further operable for : generating and providing metadata for at least some of the mapping , partitioning , combining , grouping and sorting .
JP2005353039A
CLAIM 9
各前記マシンで、ハートビート送信を、隣接する前記ツリーノードゾーン内の各前記マシンから受信するステップと、 いずれかの前記ハートビート送信がタイムリーに受信されない場合、前記隣接する前記ツリーノードゾーン内の対応する前記マシンの不在を、 前記DHTの提供を繰り返すステップと、 前記データオーバーレイを前記データ構造として、前記DHTの前記論理空間の上に構築する前記ステップを繰り返すステップと、 マルチレベルツリーを、再構築されたデータオーバーレイ内で構築するステップを繰り返すステップと、 前記複数のマシンを前記DHTの前記論理空間にマップする前記ステップを繰り返すステップとを行うこと (computer system) によって、計上するステップと をさらに備えることを特徴とする請求項1に記載の方法。

US8190610B2
CLAIM 45
. The computer system (行うこと) of claim 44 , wherein the reducing includes processing the metadata .
JP2005353039A
CLAIM 9
各前記マシンで、ハートビート送信を、隣接する前記ツリーノードゾーン内の各前記マシンから受信するステップと、 いずれかの前記ハートビート送信がタイムリーに受信されない場合、前記隣接する前記ツリーノードゾーン内の対応する前記マシンの不在を、 前記DHTの提供を繰り返すステップと、 前記データオーバーレイを前記データ構造として、前記DHTの前記論理空間の上に構築する前記ステップを繰り返すステップと、 マルチレベルツリーを、再構築されたデータオーバーレイ内で構築するステップを繰り返すステップと、 前記複数のマシンを前記DHTの前記論理空間にマップする前記ステップを繰り返すステップとを行うこと (computer system) によって、計上するステップと をさらに備えることを特徴とする請求項1に記載の方法。

US8190610B2
CLAIM 46
. The computer system (行うこと) of claim 45 , wherein iterating includes providing the associated metadata to the processing of the reducing step .
JP2005353039A
CLAIM 9
各前記マシンで、ハートビート送信を、隣接する前記ツリーノードゾーン内の各前記マシンから受信するステップと、 いずれかの前記ハートビート送信がタイムリーに受信されない場合、前記隣接する前記ツリーノードゾーン内の対応する前記マシンの不在を、 前記DHTの提供を繰り返すステップと、 前記データオーバーレイを前記データ構造として、前記DHTの前記論理空間の上に構築する前記ステップを繰り返すステップと、 マルチレベルツリーを、再構築されたデータオーバーレイ内で構築するステップを繰り返すステップと、 前記複数のマシンを前記DHTの前記論理空間にマップする前記ステップを繰り返すステップとを行うこと (computer system) によって、計上するステップと をさらに備えることを特徴とする請求項1に記載の方法。




US8190610B2

Filed: 2006-10-05     Issued: 2012-05-29

MapReduce for distributed database processing

(Original Assignee) Yahoo Inc     (Current Assignee) R2 Solutions LLC ; Altaba Inc

Ali Dasdan, Hung-Chih Yang, Ruey-Lung Hsiao
WO2005098652A2

Filed: 2005-03-25     Issued: 2005-10-20

Providing enterprise information

(Original Assignee) Cxo Systems, Inc.     

Alok Batra, Olagappan Manickam, Danko Zlokapa, Rajendra Kulkarni, Chetan Gadgil
US8190610B2
CLAIM 1
. A method of processing data of a data set over a distributed system , wherein the data set comprises a plurality of data groups , the method comprising : partitioning the data of each one of the data groups into a plurality of data partitions that each have a plurality of key-value pairs and providing each data partition to a selected one of a plurality of mapping functions that are each user-configurable to independently output a plurality of lists of values for each of a set of keys found in such map function' ;

s corresponding data partition to form corresponding intermediate data for that data group and identifiable to that data group , wherein the data of a first data group has a different schema than the data of a second data group and the data of the first data group is mapped differently than the data of the second data group so that different lists of values are output for the corresponding different intermediate data (further process) , wherein the different schema and corresponding different intermediate data have a key in common ;

and reducing the intermediate data for the data groups to at least one output data group , including processing the intermediate data for each data group in a manner that is defined to correspond to that data group , so as to result in a merging of the corresponding different intermediate data based on the key in common , wherein the mapping and reducing operations are performed by a distributed system .
WO2005098652A2
CLAIM 18
. A method comprising : processing enterprise data from distributed repositories in an assembly line fashion to produce management data that is useful in managing at least a portion of the enterprise , the assembly line including separate executable agents to perform tasks on the data , the agents including : a cleansing agent to process data that would not otherwise be useful in producing the management data , a normalizing agent to normalize the data , a transformation agent to enhance the consistency of the data , an assembler agent to assemble data to form the management data , and a staging agent to form and stage data for further process (different intermediate data) ing , the sequence and tasks of the agents in the pipeline being adaptable to changes in the portion of the enterprise being managed .

US8190610B2
CLAIM 17
. A computer system including a plurality of computing devices , the computer system configured to process data of a data set , wherein the data set comprises a plurality of data groups , the computer system comprises at least one processor and memory that are operable to perform the following operations : partitioning the data of each one of the data groups into a plurality of data partitions that each have a plurality of key-value pairs and providing each data partition to a selected one of a plurality of mapping functions that are each user-configurable to independently output a plurality of lists of values for each of a set of keys found in such map function' ;

s corresponding data partition to form corresponding intermediate data for that data group and identifiable to that data group , wherein the data of a first data group has a different schema than the data of a second data group and the data of the first data group is mapped differently than the data of the second data group so that different lists of values are output for the corresponding different intermediate data (further process) , wherein the different schema and corresponding different intermediate data have a key in common ;

and reduce the intermediate data for the data groups to at least one output data group , including processing the intermediate data for each data group in a manner that is defined to correspond to that data group so as to result in a merging of the corresponding different intermediate data based on the key in common .
WO2005098652A2
CLAIM 18
. A method comprising : processing enterprise data from distributed repositories in an assembly line fashion to produce management data that is useful in managing at least a portion of the enterprise , the assembly line including separate executable agents to perform tasks on the data , the agents including : a cleansing agent to process data that would not otherwise be useful in producing the management data , a normalizing agent to normalize the data , a transformation agent to enhance the consistency of the data , an assembler agent to assemble data to form the management data , and a staging agent to form and stage data for further process (different intermediate data) ing , the sequence and tasks of the agents in the pipeline being adaptable to changes in the portion of the enterprise being managed .

US8190610B2
CLAIM 33
. A map-reduce method of processing data from a plurality of groups having different schema over a computer system , the method comprising : for a first data set having a plurality of first key-value pairs , wherein such first data set belongs to a first data group and the first key-value pairs have a first schema , partitioning the first data set into a plurality of data partitions and providing each data partition to a selected one of a plurality of mapping functions that are each user-configurable to independently output a plurality of lists of values for each of a set of keys found in such map function' ;

s corresponding data partition to form a first intermediate data set having a first set (different one) of resulting key-value pairs ;

for a second data set having a plurality of second key-value pairs , wherein such second data set belongs to a second data group and the second key-value pairs have a second schema , partitioning the first data set into a plurality of data partitions and providing each data partition to a selected one of a plurality of mapping functions to form a second intermediate data set having a second set of resulting key-value pairs differing from the first set of resulting key-value pairs , wherein the first schema differs from the second schema , wherein the first and second schema and the first and second set of resulting key-value pairs have a key in common ;

and reducing the first and the second intermediate data set together so as to form an output data set , wherein the reducing is accomplished by iterating on at least one key from the first and second intermediate data set , and the output data set has a different schema than the first and second schema , wherein the mapping and reducing operations are performed by a distributed system .
WO2005098652A2
CLAIM 9
. A method comprising from distributed repositories of data related to an enterprise , obtaining current data to be used in connection with managing at least a portion of the enterprise , the data from different one (first set) s of the repositories having formal and temporal inconsistencies , enhancing the formal consistency of data received from different ones of the repositories , temporarily storing portions of the enhanced data to enhance temporal consistency of the data , using a model of the portion of the enterprise to analyze the temporally and formally enhanced data and to generate resulting management data , distributing the management data in a time frame that is current relative to the current data obtained from the repositories , and the identity of the current data changing adaptively over time based on the model and on the resulting management data that is to be distributed .

US8190610B2
CLAIM 40
. A computer system including a plurality of computing devices , the computer system configured to process data of a data set , wherein the data set comprises a plurality of data groups , the computer system comprises at least one processor and memory that are operable to perform the following operations : for a first data set having a plurality of first key-value pairs , wherein such first data set belongs to a first data group and the first key-value pairs have a first schema , partitioning the first data set into a plurality of data partitions and providing each data partition to a selected one of a plurality of mapping functions that are each user-configurable to independently output a plurality of lists of values for each of a set of keys found in such map function' ;

s corresponding data partition to form a first intermediate data set having a first set (different one) of resulting key-value pairs ;

for a second data set having a plurality of second key-value pairs , wherein such second data set belongs to a second data group and the second key-value pairs have a second schema , partitioning the first data set into a plurality of data partitions and providing each data partition to a selected one of a plurality of mapping functions to form a second intermediate data set having a second set of resulting key-value pairs differing from the first set of resulting key-value pairs , wherein the first schema differs from the second schema ;

and reducing the first and the second intermediate data set together so as to form an output data set , wherein the reducing is accomplished by iterating on at least one key from the first and second intermediate data set , and the output data set has a different schema than the first and second schema , wherein the first and second schema and the first and second set of resulting key-value pairs have a key in common .
WO2005098652A2
CLAIM 9
. A method comprising from distributed repositories of data related to an enterprise , obtaining current data to be used in connection with managing at least a portion of the enterprise , the data from different one (first set) s of the repositories having formal and temporal inconsistencies , enhancing the formal consistency of data received from different ones of the repositories , temporarily storing portions of the enhanced data to enhance temporal consistency of the data , using a model of the portion of the enterprise to analyze the temporally and formally enhanced data and to generate resulting management data , distributing the management data in a time frame that is current relative to the current data obtained from the repositories , and the identity of the current data changing adaptively over time based on the model and on the resulting management data that is to be distributed .




US8190610B2

Filed: 2006-10-05     Issued: 2012-05-29

MapReduce for distributed database processing

(Original Assignee) Yahoo Inc     (Current Assignee) R2 Solutions LLC ; Altaba Inc

Ali Dasdan, Hung-Chih Yang, Ruey-Lung Hsiao
EP1566752A2

Filed: 2005-02-11     Issued: 2005-08-24

Rapid visual sorting of digital files and data

(Original Assignee) Microsoft Corp     (Current Assignee) Microsoft Corp

Asta J. Roseway, Curtis Wong, Steven C. Glenner, Steven D. Demar, Steven M. Drucker
US8190610B2
CLAIM 1
. A method of processing data of a data set (respective sets, two sets) over a distributed system , wherein the data set comprises a plurality of data groups , the method comprising : partitioning the data of each one of the data groups into a plurality of data partitions that each have a plurality of key-value pairs and providing each data partition to a selected one of a plurality of mapping functions that are each user-configurable to independently output a plurality of lists of values for each of a set of keys found in such map function' ;

s corresponding data partition to form corresponding intermediate data for that data group and identifiable to that data group , wherein the data of a first data group has a different schema than the data of a second data group and the data of the first data group is mapped differently than the data of the second data group so that different lists of values are output for the corresponding different intermediate data , wherein the different schema and corresponding different intermediate data have a key in common ;

and reducing the intermediate data for the data groups to at least one output data group , including processing the intermediate data for each data group in a manner that is defined to correspond to that data group , so as to result in a merging of the corresponding different intermediate data based on the key in common , wherein the mapping and reducing operations are performed by a distributed system .
EP1566752A2
CLAIM 9
The method of claim 1 further comprising , displaying images representative of at least two sets (data set) of digital data , with one image being shown as a currently selected image .

EP1566752A2
CLAIM 15
A method in a computer system having a graphical user interface including at least one display and at least one user interface selection device , comprising : displaying a plurality of images representing respective sets (data set) of digital data ;
scrolling the plurality of images based on user scrolling instructions received via a selection device ;
and sorting at least some of the sets of digital data based on user sorting instructions received via a selection device ;
and automatically maintaining metadata in association with the digital data based on the user sorting instructions .

US8190610B2
CLAIM 17
. A computer system (computer system) including a plurality of computing devices , the computer system configured to process data of a data set (respective sets, two sets) , wherein the data set comprises a plurality of data groups , the computer system comprises at least one processor and memory that are operable to perform the following operations : partitioning the data of each one of the data groups into a plurality of data partitions that each have a plurality of key-value pairs and providing each data partition to a selected one of a plurality of mapping functions that are each user-configurable to independently output a plurality of lists of values for each of a set of keys found in such map function' ;

s corresponding data partition to form corresponding intermediate data for that data group and identifiable to that data group , wherein the data of a first data group has a different schema than the data of a second data group and the data of the first data group is mapped differently than the data of the second data group so that different lists of values are output for the corresponding different intermediate data , wherein the different schema and corresponding different intermediate data have a key in common ;

and reduce the intermediate data for the data groups to at least one output data group , including processing the intermediate data for each data group in a manner that is defined to correspond to that data group so as to result in a merging of the corresponding different intermediate data based on the key in common .
EP1566752A2
CLAIM 9
The method of claim 1 further comprising , displaying images representative of at least two sets (data set) of digital data , with one image being shown as a currently selected image .

EP1566752A2
CLAIM 15
A method in a computer system (computer system) having a graphical user interface including at least one display and at least one user interface selection device , comprising : displaying a plurality of images representing respective sets (data set) of digital data ;
scrolling the plurality of images based on user scrolling instructions received via a selection device ;
and sorting at least some of the sets of digital data based on user sorting instructions received via a selection device ;
and automatically maintaining metadata in association with the digital data based on the user sorting instructions .

US8190610B2
CLAIM 18
. The computer system (computer system) of claim 17 , wherein : the at least one output data group is a plurality of output data groups .
EP1566752A2
CLAIM 15
A method in a computer system (computer system) having a graphical user interface including at least one display and at least one user interface selection device , comprising : displaying a plurality of images representing respective sets of digital data ;
scrolling the plurality of images based on user scrolling instructions received via a selection device ;
and sorting at least some of the sets of digital data based on user sorting instructions received via a selection device ;
and automatically maintaining metadata in association with the digital data based on the user sorting instructions .

US8190610B2
CLAIM 19
. The computer system (computer system) of claim 17 , wherein : corresponding intermediate data for a data group being identifiable to the data group includes the corresponding intermediate data being stored such that metadata associated with the manner of storage is identifiable to the data group .
EP1566752A2
CLAIM 15
A method in a computer system (computer system) having a graphical user interface including at least one display and at least one user interface selection device , comprising : displaying a plurality of images representing respective sets of digital data ;
scrolling the plurality of images based on user scrolling instructions received via a selection device ;
and sorting at least some of the sets of digital data based on user sorting instructions received via a selection device ;
and automatically maintaining metadata in association with the digital data based on the user sorting instructions .

US8190610B2
CLAIM 20
. The computer system (computer system) of claim 19 , wherein : corresponding intermediate data for a data group being identifiable to the data group includes the computer system being configured to store the corresponding intermediate data in association with data that is identifiable to the data group .
EP1566752A2
CLAIM 15
A method in a computer system (computer system) having a graphical user interface including at least one display and at least one user interface selection device , comprising : displaying a plurality of images representing respective sets of digital data ;
scrolling the plurality of images based on user scrolling instructions received via a selection device ;
and sorting at least some of the sets of digital data based on user sorting instructions received via a selection device ;
and automatically maintaining metadata in association with the digital data based on the user sorting instructions .

US8190610B2
CLAIM 21
. The computer system (computer system) of claim 17 , wherein : the at least one processor and memory being configured to process the intermediate data for each data group in a manner that is defined to correspond to that data group includes , for each data group , the at least one processor and memory being configured to employ an iterator that corresponds to that data group .
EP1566752A2
CLAIM 15
A method in a computer system (computer system) having a graphical user interface including at least one display and at least one user interface selection device , comprising : displaying a plurality of images representing respective sets of digital data ;
scrolling the plurality of images based on user scrolling instructions received via a selection device ;
and sorting at least some of the sets of digital data based on user sorting instructions received via a selection device ;
and automatically maintaining metadata in association with the digital data based on the user sorting instructions .

US8190610B2
CLAIM 22
. The computer system (computer system) of claim 21 , wherein : the intermediate data includes a plurality of grouped sets of key/value pairs ;

the at least one processor and memory are further operable to reduce the intermediate data via a plurality of reducers ;

for at least one of the reducers , the iterator corresponding to a particular data group , for that reducer , is configured to operate according to a different key of a different schema than the iterator corresponding to another particular data group , for that reducer .
EP1566752A2
CLAIM 15
A method in a computer system (computer system) having a graphical user interface including at least one display and at least one user interface selection device , comprising : displaying a plurality of images representing respective sets of digital data ;
scrolling the plurality of images based on user scrolling instructions received via a selection device ;
and sorting at least some of the sets of digital data based on user sorting instructions received via a selection device ;
and automatically maintaining metadata in association with the digital data based on the user sorting instructions .

US8190610B2
CLAIM 23
. The computer system (computer system) of claim 17 , wherein : the intermediate data includes a plurality of grouped sets of key/value pairs ;

the at least one processor and memory are further operable to reduce the intermediate data via a plurality of reducers ;

and the at least one processor and memory are further operable to : partition the intermediate data into a plurality of partitions , to provide each key/value pair of the intermediate data to a separate one of the partitions ;

and provide the intermediate data of each partition to a separate one of the reducers .
EP1566752A2
CLAIM 15
A method in a computer system (computer system) having a graphical user interface including at least one display and at least one user interface selection device , comprising : displaying a plurality of images representing respective sets of digital data ;
scrolling the plurality of images based on user scrolling instructions received via a selection device ;
and sorting at least some of the sets of digital data based on user sorting instructions received via a selection device ;
and automatically maintaining metadata in association with the digital data based on the user sorting instructions .

US8190610B2
CLAIM 24
. The computer system (computer system) of claim 17 , wherein : the intermediate data includes a plurality of grouped sets of key/value pairs ;

the at least one processor and memory are further operable to reduce the intermediate data via a plurality of reducers ;

and the at least one processor and memory are further operable to : partition the intermediate data into a plurality of partitions , to provide at least some of the key/value pairs of the intermediate data to more than one of the partitions ;

and provide the intermediate data of each partition to a separate one of the reducers .
EP1566752A2
CLAIM 15
A method in a computer system (computer system) having a graphical user interface including at least one display and at least one user interface selection device , comprising : displaying a plurality of images representing respective sets of digital data ;
scrolling the plurality of images based on user scrolling instructions received via a selection device ;
and sorting at least some of the sets of digital data based on user sorting instructions received via a selection device ;
and automatically maintaining metadata in association with the digital data based on the user sorting instructions .

US8190610B2
CLAIM 25
. The computer system (computer system) of claim 24 , wherein : the at least one processor and memory are further operable to , in partitioning , to provide all of the key/value pairs of the intermediate data to all of the partitions .
EP1566752A2
CLAIM 15
A method in a computer system (computer system) having a graphical user interface including at least one display and at least one user interface selection device , comprising : displaying a plurality of images representing respective sets of digital data ;
scrolling the plurality of images based on user scrolling instructions received via a selection device ;
and sorting at least some of the sets of digital data based on user sorting instructions received via a selection device ;
and automatically maintaining metadata in association with the digital data based on the user sorting instructions .

US8190610B2
CLAIM 26
. The computer system (computer system) of claim 24 , wherein : at least some of the reducers include a sort , group-by-key and combine task ;

the at least one processor and memory are further operable to generate and provide metadata for at least some of the mapping , partitioning , combining , grouping and sorting .
EP1566752A2
CLAIM 15
A method in a computer system (computer system) having a graphical user interface including at least one display and at least one user interface selection device , comprising : displaying a plurality of images representing respective sets of digital data ;
scrolling the plurality of images based on user scrolling instructions received via a selection device ;
and sorting at least some of the sets of digital data based on user sorting instructions received via a selection device ;
and automatically maintaining metadata in association with the digital data based on the user sorting instructions .

US8190610B2
CLAIM 27
. The computer system (computer system) of claim 26 , wherein : the reducing includes processing the metadata .
EP1566752A2
CLAIM 15
A method in a computer system (computer system) having a graphical user interface including at least one display and at least one user interface selection device , comprising : displaying a plurality of images representing respective sets of digital data ;
scrolling the plurality of images based on user scrolling instructions received via a selection device ;
and sorting at least some of the sets of digital data based on user sorting instructions received via a selection device ;
and automatically maintaining metadata in association with the digital data based on the user sorting instructions .

US8190610B2
CLAIM 28
. The computer system (computer system) of claim 27 , wherein : the at least one processor and memory being further operable to process the intermediate data for each data group in a manner that is defined to correspond to that data group includes , for each data group , the at least one processor and memory being further operable to employ an iterator that corresponds to that data group , wherein the iterator is configured to provide the associated metadata to the reducing .
EP1566752A2
CLAIM 15
A method in a computer system (computer system) having a graphical user interface including at least one display and at least one user interface selection device , comprising : displaying a plurality of images representing respective sets of digital data ;
scrolling the plurality of images based on user scrolling instructions received via a selection device ;
and sorting at least some of the sets of digital data based on user sorting instructions received via a selection device ;
and automatically maintaining metadata in association with the digital data based on the user sorting instructions .

US8190610B2
CLAIM 29
. The computer system (computer system) of claim 17 , wherein : the intermediate data processing of the reducing further comprises processing data that is not intermediate data .
EP1566752A2
CLAIM 15
A method in a computer system (computer system) having a graphical user interface including at least one display and at least one user interface selection device , comprising : displaying a plurality of images representing respective sets of digital data ;
scrolling the plurality of images based on user scrolling instructions received via a selection device ;
and sorting at least some of the sets of digital data based on user sorting instructions received via a selection device ;
and automatically maintaining metadata in association with the digital data based on the user sorting instructions .

US8190610B2
CLAIM 30
. The computer system (computer system) of claim 29 , wherein : the at least one processor and memory are further operable to reduce the intermediate data via a plurality of reducers ;

and the data that is not intermediate data , for a particular reducer , includes data that is associated with another reducer .
EP1566752A2
CLAIM 15
A method in a computer system (computer system) having a graphical user interface including at least one display and at least one user interface selection device , comprising : displaying a plurality of images representing respective sets of digital data ;
scrolling the plurality of images based on user scrolling instructions received via a selection device ;
and sorting at least some of the sets of digital data based on user sorting instructions received via a selection device ;
and automatically maintaining metadata in association with the digital data based on the user sorting instructions .

US8190610B2
CLAIM 31
. The computer system (computer system) of claim 29 , wherein : the at least one processor and memory are further operable to reduce the intermediate data via a plurality of reducers ;

and the data that is not intermediate data , for a particular reducer , includes data that is associated with that reducer .
EP1566752A2
CLAIM 15
A method in a computer system (computer system) having a graphical user interface including at least one display and at least one user interface selection device , comprising : displaying a plurality of images representing respective sets of digital data ;
scrolling the plurality of images based on user scrolling instructions received via a selection device ;
and sorting at least some of the sets of digital data based on user sorting instructions received via a selection device ;
and automatically maintaining metadata in association with the digital data based on the user sorting instructions .

US8190610B2
CLAIM 32
. The computer system (computer system) of claim 17 , wherein the at least one processor and memory are further operable to reduce by relating the data among the plurality of data groups .
EP1566752A2
CLAIM 15
A method in a computer system (computer system) having a graphical user interface including at least one display and at least one user interface selection device , comprising : displaying a plurality of images representing respective sets of digital data ;
scrolling the plurality of images based on user scrolling instructions received via a selection device ;
and sorting at least some of the sets of digital data based on user sorting instructions received via a selection device ;
and automatically maintaining metadata in association with the digital data based on the user sorting instructions .

US8190610B2
CLAIM 33
. A map-reduce method of processing data from a plurality of groups having different schema over a computer system (computer system) , the method comprising : for a first data set (respective sets, two sets) having a plurality of first key-value pairs , wherein such first data set belongs to a first data group and the first key-value pairs have a first schema , partitioning the first data set into a plurality of data partitions and providing each data partition to a selected one of a plurality of mapping functions that are each user-configurable to independently output a plurality of lists of values for each of a set of keys found in such map function' ;

s corresponding data partition to form a first intermediate data set having a first set of resulting key-value pairs ;

for a second data set having a plurality of second key-value pairs , wherein such second data set belongs to a second data group and the second key-value pairs have a second schema , partitioning the first data set into a plurality of data partitions and providing each data partition to a selected one of a plurality of mapping functions to form a second intermediate data set having a second set of resulting key-value pairs differing from the first set of resulting key-value pairs , wherein the first schema differs from the second schema , wherein the first and second schema and the first and second set of resulting key-value pairs have a key in common ;

and reducing the first and the second intermediate data set together so as to form an output data set , wherein the reducing is accomplished by iterating on at least one key from the first and second intermediate data set , and the output data set has a different schema than the first and second schema , wherein the mapping and reducing operations are performed by a distributed system .
EP1566752A2
CLAIM 9
The method of claim 1 further comprising , displaying images representative of at least two sets (data set) of digital data , with one image being shown as a currently selected image .

EP1566752A2
CLAIM 15
A method in a computer system (computer system) having a graphical user interface including at least one display and at least one user interface selection device , comprising : displaying a plurality of images representing respective sets (data set) of digital data ;
scrolling the plurality of images based on user scrolling instructions received via a selection device ;
and sorting at least some of the sets of digital data based on user sorting instructions received via a selection device ;
and automatically maintaining metadata in association with the digital data based on the user sorting instructions .

US8190610B2
CLAIM 34
. The map-reduce method of claim 33 , wherein the reducing is accomplished by iterating on at least one key from each of the first and second intermediate data set (respective sets, two sets) so that the output data set is a merging of a portion of the first and second intermediate data set .
EP1566752A2
CLAIM 9
The method of claim 1 further comprising , displaying images representative of at least two sets (data set) of digital data , with one image being shown as a currently selected image .

EP1566752A2
CLAIM 15
A method in a computer system having a graphical user interface including at least one display and at least one user interface selection device , comprising : displaying a plurality of images representing respective sets (data set) of digital data ;
scrolling the plurality of images based on user scrolling instructions received via a selection device ;
and sorting at least some of the sets of digital data based on user sorting instructions received via a selection device ;
and automatically maintaining metadata in association with the digital data based on the user sorting instructions .

US8190610B2
CLAIM 35
. The map-reduce method of claim 33 , wherein the reducing is accomplished by a plurality of reducers , the method further comprises : partitioning the first or second intermediate data set (respective sets, two sets) into a plurality of partitions , at least some of the key-value pairs of the first or second intermediate data being provided to more than one of the partitions ;

and providing the partitioned intermediate data set of each partition to a separate one of the reducers .
EP1566752A2
CLAIM 9
The method of claim 1 further comprising , displaying images representative of at least two sets (data set) of digital data , with one image being shown as a currently selected image .

EP1566752A2
CLAIM 15
A method in a computer system having a graphical user interface including at least one display and at least one user interface selection device , comprising : displaying a plurality of images representing respective sets (data set) of digital data ;
scrolling the plurality of images based on user scrolling instructions received via a selection device ;
and sorting at least some of the sets of digital data based on user sorting instructions received via a selection device ;
and automatically maintaining metadata in association with the digital data based on the user sorting instructions .

US8190610B2
CLAIM 36
. The map-reduce method of claim 35 , wherein all of the key-value pairs of the first and second intermediate data set (respective sets, two sets) are provided to all of the reducers .
EP1566752A2
CLAIM 9
The method of claim 1 further comprising , displaying images representative of at least two sets (data set) of digital data , with one image being shown as a currently selected image .

EP1566752A2
CLAIM 15
A method in a computer system having a graphical user interface including at least one display and at least one user interface selection device , comprising : displaying a plurality of images representing respective sets (data set) of digital data ;
scrolling the plurality of images based on user scrolling instructions received via a selection device ;
and sorting at least some of the sets of digital data based on user sorting instructions received via a selection device ;
and automatically maintaining metadata in association with the digital data based on the user sorting instructions .

US8190610B2
CLAIM 40
. A computer system (computer system) including a plurality of computing devices , the computer system configured to process data of a data set (respective sets, two sets) , wherein the data set comprises a plurality of data groups , the computer system comprises at least one processor and memory that are operable to perform the following operations : for a first data set having a plurality of first key-value pairs , wherein such first data set belongs to a first data group and the first key-value pairs have a first schema , partitioning the first data set into a plurality of data partitions and providing each data partition to a selected one of a plurality of mapping functions that are each user-configurable to independently output a plurality of lists of values for each of a set of keys found in such map function' ;

s corresponding data partition to form a first intermediate data set having a first set of resulting key-value pairs ;

for a second data set having a plurality of second key-value pairs , wherein such second data set belongs to a second data group and the second key-value pairs have a second schema , partitioning the first data set into a plurality of data partitions and providing each data partition to a selected one of a plurality of mapping functions to form a second intermediate data set having a second set of resulting key-value pairs differing from the first set of resulting key-value pairs , wherein the first schema differs from the second schema ;

and reducing the first and the second intermediate data set together so as to form an output data set , wherein the reducing is accomplished by iterating on at least one key from the first and second intermediate data set , and the output data set has a different schema than the first and second schema , wherein the first and second schema and the first and second set of resulting key-value pairs have a key in common .
EP1566752A2
CLAIM 9
The method of claim 1 further comprising , displaying images representative of at least two sets (data set) of digital data , with one image being shown as a currently selected image .

EP1566752A2
CLAIM 15
A method in a computer system (computer system) having a graphical user interface including at least one display and at least one user interface selection device , comprising : displaying a plurality of images representing respective sets (data set) of digital data ;
scrolling the plurality of images based on user scrolling instructions received via a selection device ;
and sorting at least some of the sets of digital data based on user sorting instructions received via a selection device ;
and automatically maintaining metadata in association with the digital data based on the user sorting instructions .

US8190610B2
CLAIM 41
. The computer system (computer system) of claim 40 , wherein the reducing is accomplished by iterating on at least one key from each of the first and second intermediate data set (respective sets, two sets) so that the output data set is a merging of a portion of the first and second intermediate data set .
EP1566752A2
CLAIM 9
The method of claim 1 further comprising , displaying images representative of at least two sets (data set) of digital data , with one image being shown as a currently selected image .

EP1566752A2
CLAIM 15
A method in a computer system (computer system) having a graphical user interface including at least one display and at least one user interface selection device , comprising : displaying a plurality of images representing respective sets (data set) of digital data ;
scrolling the plurality of images based on user scrolling instructions received via a selection device ;
and sorting at least some of the sets of digital data based on user sorting instructions received via a selection device ;
and automatically maintaining metadata in association with the digital data based on the user sorting instructions .

US8190610B2
CLAIM 42
. The computer system (computer system) of claim 40 , wherein the reducing is accomplished by a plurality of reducers , the d the at least one processor and memory further operable for : partitioning the first or second intermediate data set (respective sets, two sets) into a plurality of partitions , at least some of the key-value pairs of the first or second intermediate data being provided to more than one of the partitions ;

and providing the partitioned intermediate data set of each partition to a separate one of the reducers .
EP1566752A2
CLAIM 9
The method of claim 1 further comprising , displaying images representative of at least two sets (data set) of digital data , with one image being shown as a currently selected image .

EP1566752A2
CLAIM 15
A method in a computer system (computer system) having a graphical user interface including at least one display and at least one user interface selection device , comprising : displaying a plurality of images representing respective sets (data set) of digital data ;
scrolling the plurality of images based on user scrolling instructions received via a selection device ;
and sorting at least some of the sets of digital data based on user sorting instructions received via a selection device ;
and automatically maintaining metadata in association with the digital data based on the user sorting instructions .

US8190610B2
CLAIM 43
. The computer system (computer system) of claim 42 , wherein all of the key-value pairs of the first and second intermediate data set (respective sets, two sets) are provided to all of the reducers .
EP1566752A2
CLAIM 9
The method of claim 1 further comprising , displaying images representative of at least two sets (data set) of digital data , with one image being shown as a currently selected image .

EP1566752A2
CLAIM 15
A method in a computer system (computer system) having a graphical user interface including at least one display and at least one user interface selection device , comprising : displaying a plurality of images representing respective sets (data set) of digital data ;
scrolling the plurality of images based on user scrolling instructions received via a selection device ;
and sorting at least some of the sets of digital data based on user sorting instructions received via a selection device ;
and automatically maintaining metadata in association with the digital data based on the user sorting instructions .

US8190610B2
CLAIM 44
. The computer system (computer system) of claim 42 , wherein at least some of the reducers include a sort , group-by-key and combine task , the d the at least one processor and memory are further operable for : generating and providing metadata for at least some of the mapping , partitioning , combining , grouping and sorting .
EP1566752A2
CLAIM 15
A method in a computer system (computer system) having a graphical user interface including at least one display and at least one user interface selection device , comprising : displaying a plurality of images representing respective sets of digital data ;
scrolling the plurality of images based on user scrolling instructions received via a selection device ;
and sorting at least some of the sets of digital data based on user sorting instructions received via a selection device ;
and automatically maintaining metadata in association with the digital data based on the user sorting instructions .

US8190610B2
CLAIM 45
. The computer system (computer system) of claim 44 , wherein the reducing includes processing the metadata .
EP1566752A2
CLAIM 15
A method in a computer system (computer system) having a graphical user interface including at least one display and at least one user interface selection device , comprising : displaying a plurality of images representing respective sets of digital data ;
scrolling the plurality of images based on user scrolling instructions received via a selection device ;
and sorting at least some of the sets of digital data based on user sorting instructions received via a selection device ;
and automatically maintaining metadata in association with the digital data based on the user sorting instructions .

US8190610B2
CLAIM 46
. The computer system (computer system) of claim 45 , wherein iterating includes providing the associated metadata to the processing of the reducing step .
EP1566752A2
CLAIM 15
A method in a computer system (computer system) having a graphical user interface including at least one display and at least one user interface selection device , comprising : displaying a plurality of images representing respective sets of digital data ;
scrolling the plurality of images based on user scrolling instructions received via a selection device ;
and sorting at least some of the sets of digital data based on user sorting instructions received via a selection device ;
and automatically maintaining metadata in association with the digital data based on the user sorting instructions .




US8190610B2

Filed: 2006-10-05     Issued: 2012-05-29

MapReduce for distributed database processing

(Original Assignee) Yahoo Inc     (Current Assignee) R2 Solutions LLC ; Altaba Inc

Ali Dasdan, Hung-Chih Yang, Ruey-Lung Hsiao
JP2006195890A

Filed: 2005-01-17     Issued: 2006-07-27

情報処理装置、システム、データ同期方法及びプログラム

(Original Assignee) Fuji Xerox Co Ltd; 富士ゼロックス株式会社     

Masaru Fukami, Atsushi Nakamura, Takeshi Nishizawa, Noriyasu Tsuboyama, 淳 中村, 徳保 坪山, 大 深見, 剛 西沢
US8190610B2
CLAIM 1
. A method of processing data (コンピュ) of a data set over a distributed system , wherein the data set comprises a plurality of data groups , the method comprising : partitioning the data of each one of the data groups into a plurality of data partitions that each have a plurality of key-value pairs and providing each data partition to a selected one of a plurality of mapping functions that are each user-configurable to independently output a plurality of lists of values for each of a set of keys found in such map function' ;

s corresponding data partition to form corresponding intermediate data for that data group and identifiable to that data group , wherein the data of a first data group has a different schema (前記第2グループ, 要求手段) than the data of a second data group and the data of the first data group is mapped differently than the data of the second data group so that different lists of values are output for the corresponding different intermediate data , wherein the different schema and corresponding different intermediate data have a key in common ;

and reducing the intermediate data for the data groups to at least one output data group , including processing the intermediate data for each data group in a manner that is defined to correspond to that data group , so as to result in a merging of the corresponding different intermediate data based on the key in common , wherein the mapping and reducing operations are performed by a distributed system .
JP2006195890A
CLAIM 4
自装置と同一の第1グループに属する他の複数台の情報処理装置と通信回線を介して接続された情報処理装置であって、 前記第1グループに属する各情報処理装置が各々保持すべきでかつ互いに同期させるべき共有データを記憶可能な記憶手段と、 前記第1グループに属する特定の情報処理装置から、該特定の情報処理装置が保持している前記共有データが更新されたことを通知する通知情報を受信した場合に、ランダムに決定した待機時間、或いは前記第1グループに属する個々の情報装置装置毎に相違するように予め設定された待機時間が経過した後に、前記共有データの取得要求を前記特定の情報処理装置へ送信する取得要求手段 (different schema) と、 前記取得要求手段が送信した取得要求に基づいて前記特定の情報処理装置から送信された前記共有データを受信し、受信した前記共有データを前記記憶手段に記憶させる受信手段と、 を含む情報処理装置。

JP2006195890A
CLAIM 6
前記情報処理装置は、前記第1グループを包含する第2グループにも属していると共に、前記第1グループに属さず前記第2グループ (different schema) に属する他の複数の情報処理装置とも通信回線を介して接続されており、 前記受信手段が前記第2グループに属する各情報処理装置が各々保持すべきでかつ互いに同期させるべき共有データを前記特定の情報処理装置から受信して前記記憶手段に記憶させた後に、前記共有データが更新されたことを通知する通知情報を、前記第1グループに属さず前記第2グループに属する他の全ての情報処理装置へ送信する第2通知手段と、 前記第2通知手段が通知情報を送信した情報処理装置から前記共有データの取得要求を受信する毎に、前記記憶手段に記憶されている前記共有データを、前記取得要求送信元の情報処理装置へ送信する第2送信手段と、 を更に備えたことを特徴とする請求項4記載の情報処理装置。

JP2006195890A
CLAIM 10
自装置と同一の第1グループに属する他の複数台の情報処理装置と通信回線を介して接続され、前記第1グループに属する各情報処理装置が各々保持すべきでかつ互いに同期させるべき共有データを記憶可能な記憶手段を備えたコンピュ (processing data) ータを、 前記記憶手段に記憶されている前記共有データが更新された場合に、前記共有データが更新されたことを通知する通知情報を、前記第1グループに属する他の全ての情報処理装置へ送信する第1通知手段、 前記第1通知手段が通知情報を送信した情報処理装置から前記共有データの取得要求を受信する毎に、前記記憶手段に記憶されている前記共有データを、前記取得要求送信元の情報処理装置へ送信する第1送信手段、 前記第1グループに属する特定の情報処理装置から、該特定の情報処理装置が保持している前記共有データが更新されたことを通知する通知情報を受信した場合に、ランダムに決定した待機時間、或いは前記第1グループに属する個々の情報装置装置毎に相違するように予め設定された待機時間が経過した後に、前記共有データの取得要求を前記特定の情報処理装置へ送信する取得要求手段、 及び、前記取得要求手段が送信した取得要求に基づいて前記特定の情報処理装置から送信された前記共有データを受信し、受信した前記共有データを前記記憶手段に記憶させる受信手段 として機能させるデータ同期プログラム。

US8190610B2
CLAIM 12
. The method of claim 5 , wherein : the intermediate data includes a plurality of grouped sets of key/value pairs ;

the reducing step is carried out by a plurality of reducers ;

for at least one of the reducers , the iterator corresponding to a particular data group , for that reducer , operates according to a different key of a different schema (前記第2グループ, 要求手段) than the iterator corresponding to another particular data group , for that reducer .
JP2006195890A
CLAIM 4
自装置と同一の第1グループに属する他の複数台の情報処理装置と通信回線を介して接続された情報処理装置であって、 前記第1グループに属する各情報処理装置が各々保持すべきでかつ互いに同期させるべき共有データを記憶可能な記憶手段と、 前記第1グループに属する特定の情報処理装置から、該特定の情報処理装置が保持している前記共有データが更新されたことを通知する通知情報を受信した場合に、ランダムに決定した待機時間、或いは前記第1グループに属する個々の情報装置装置毎に相違するように予め設定された待機時間が経過した後に、前記共有データの取得要求を前記特定の情報処理装置へ送信する取得要求手段 (different schema) と、 前記取得要求手段が送信した取得要求に基づいて前記特定の情報処理装置から送信された前記共有データを受信し、受信した前記共有データを前記記憶手段に記憶させる受信手段と、 を含む情報処理装置。

JP2006195890A
CLAIM 6
前記情報処理装置は、前記第1グループを包含する第2グループにも属していると共に、前記第1グループに属さず前記第2グループ (different schema) に属する他の複数の情報処理装置とも通信回線を介して接続されており、 前記受信手段が前記第2グループに属する各情報処理装置が各々保持すべきでかつ互いに同期させるべき共有データを前記特定の情報処理装置から受信して前記記憶手段に記憶させた後に、前記共有データが更新されたことを通知する通知情報を、前記第1グループに属さず前記第2グループに属する他の全ての情報処理装置へ送信する第2通知手段と、 前記第2通知手段が通知情報を送信した情報処理装置から前記共有データの取得要求を受信する毎に、前記記憶手段に記憶されている前記共有データを、前記取得要求送信元の情報処理装置へ送信する第2送信手段と、 を更に備えたことを特徴とする請求項4記載の情報処理装置。

US8190610B2
CLAIM 13
. The method of claim 1 , wherein : the intermediate data processing step of the reducing step further comprises processing data (コンピュ) that is not intermediate data .
JP2006195890A
CLAIM 10
自装置と同一の第1グループに属する他の複数台の情報処理装置と通信回線を介して接続され、前記第1グループに属する各情報処理装置が各々保持すべきでかつ互いに同期させるべき共有データを記憶可能な記憶手段を備えたコンピュ (processing data) ータを、 前記記憶手段に記憶されている前記共有データが更新された場合に、前記共有データが更新されたことを通知する通知情報を、前記第1グループに属する他の全ての情報処理装置へ送信する第1通知手段、 前記第1通知手段が通知情報を送信した情報処理装置から前記共有データの取得要求を受信する毎に、前記記憶手段に記憶されている前記共有データを、前記取得要求送信元の情報処理装置へ送信する第1送信手段、 前記第1グループに属する特定の情報処理装置から、該特定の情報処理装置が保持している前記共有データが更新されたことを通知する通知情報を受信した場合に、ランダムに決定した待機時間、或いは前記第1グループに属する個々の情報装置装置毎に相違するように予め設定された待機時間が経過した後に、前記共有データの取得要求を前記特定の情報処理装置へ送信する取得要求手段、 及び、前記取得要求手段が送信した取得要求に基づいて前記特定の情報処理装置から送信された前記共有データを受信し、受信した前記共有データを前記記憶手段に記憶させる受信手段 として機能させるデータ同期プログラム。

US8190610B2
CLAIM 17
. A computer system including a plurality of computing devices , the computer system configured to process data of a data set , wherein the data set comprises a plurality of data groups , the computer system comprises at least one processor and memory that are operable to perform the following operations : partitioning the data of each one of the data groups into a plurality of data partitions that each have a plurality of key-value pairs and providing each data partition to a selected one of a plurality of mapping functions that are each user-configurable to independently output a plurality of lists of values for each of a set of keys found in such map function' ;

s corresponding data partition to form corresponding intermediate data for that data group and identifiable to that data group , wherein the data of a first data group has a different schema (前記第2グループ, 要求手段) than the data of a second data group and the data of the first data group is mapped differently than the data of the second data group so that different lists of values are output for the corresponding different intermediate data , wherein the different schema and corresponding different intermediate data have a key in common ;

and reduce the intermediate data for the data groups to at least one output data group , including processing the intermediate data for each data group in a manner that is defined to correspond to that data group so as to result in a merging of the corresponding different intermediate data based on the key in common .
JP2006195890A
CLAIM 4
自装置と同一の第1グループに属する他の複数台の情報処理装置と通信回線を介して接続された情報処理装置であって、 前記第1グループに属する各情報処理装置が各々保持すべきでかつ互いに同期させるべき共有データを記憶可能な記憶手段と、 前記第1グループに属する特定の情報処理装置から、該特定の情報処理装置が保持している前記共有データが更新されたことを通知する通知情報を受信した場合に、ランダムに決定した待機時間、或いは前記第1グループに属する個々の情報装置装置毎に相違するように予め設定された待機時間が経過した後に、前記共有データの取得要求を前記特定の情報処理装置へ送信する取得要求手段 (different schema) と、 前記取得要求手段が送信した取得要求に基づいて前記特定の情報処理装置から送信された前記共有データを受信し、受信した前記共有データを前記記憶手段に記憶させる受信手段と、 を含む情報処理装置。

JP2006195890A
CLAIM 6
前記情報処理装置は、前記第1グループを包含する第2グループにも属していると共に、前記第1グループに属さず前記第2グループ (different schema) に属する他の複数の情報処理装置とも通信回線を介して接続されており、 前記受信手段が前記第2グループに属する各情報処理装置が各々保持すべきでかつ互いに同期させるべき共有データを前記特定の情報処理装置から受信して前記記憶手段に記憶させた後に、前記共有データが更新されたことを通知する通知情報を、前記第1グループに属さず前記第2グループに属する他の全ての情報処理装置へ送信する第2通知手段と、 前記第2通知手段が通知情報を送信した情報処理装置から前記共有データの取得要求を受信する毎に、前記記憶手段に記憶されている前記共有データを、前記取得要求送信元の情報処理装置へ送信する第2送信手段と、 を更に備えたことを特徴とする請求項4記載の情報処理装置。

US8190610B2
CLAIM 22
. The computer system of claim 21 , wherein : the intermediate data includes a plurality of grouped sets of key/value pairs ;

the at least one processor and memory are further operable to reduce the intermediate data via a plurality of reducers ;

for at least one of the reducers , the iterator corresponding to a particular data group , for that reducer , is configured to operate according to a different key of a different schema (前記第2グループ, 要求手段) than the iterator corresponding to another particular data group , for that reducer .
JP2006195890A
CLAIM 4
自装置と同一の第1グループに属する他の複数台の情報処理装置と通信回線を介して接続された情報処理装置であって、 前記第1グループに属する各情報処理装置が各々保持すべきでかつ互いに同期させるべき共有データを記憶可能な記憶手段と、 前記第1グループに属する特定の情報処理装置から、該特定の情報処理装置が保持している前記共有データが更新されたことを通知する通知情報を受信した場合に、ランダムに決定した待機時間、或いは前記第1グループに属する個々の情報装置装置毎に相違するように予め設定された待機時間が経過した後に、前記共有データの取得要求を前記特定の情報処理装置へ送信する取得要求手段 (different schema) と、 前記取得要求手段が送信した取得要求に基づいて前記特定の情報処理装置から送信された前記共有データを受信し、受信した前記共有データを前記記憶手段に記憶させる受信手段と、 を含む情報処理装置。

JP2006195890A
CLAIM 6
前記情報処理装置は、前記第1グループを包含する第2グループにも属していると共に、前記第1グループに属さず前記第2グループ (different schema) に属する他の複数の情報処理装置とも通信回線を介して接続されており、 前記受信手段が前記第2グループに属する各情報処理装置が各々保持すべきでかつ互いに同期させるべき共有データを前記特定の情報処理装置から受信して前記記憶手段に記憶させた後に、前記共有データが更新されたことを通知する通知情報を、前記第1グループに属さず前記第2グループに属する他の全ての情報処理装置へ送信する第2通知手段と、 前記第2通知手段が通知情報を送信した情報処理装置から前記共有データの取得要求を受信する毎に、前記記憶手段に記憶されている前記共有データを、前記取得要求送信元の情報処理装置へ送信する第2送信手段と、 を更に備えたことを特徴とする請求項4記載の情報処理装置。

US8190610B2
CLAIM 29
. The computer system of claim 17 , wherein : the intermediate data processing of the reducing further comprises processing data (コンピュ) that is not intermediate data .
JP2006195890A
CLAIM 10
自装置と同一の第1グループに属する他の複数台の情報処理装置と通信回線を介して接続され、前記第1グループに属する各情報処理装置が各々保持すべきでかつ互いに同期させるべき共有データを記憶可能な記憶手段を備えたコンピュ (processing data) ータを、 前記記憶手段に記憶されている前記共有データが更新された場合に、前記共有データが更新されたことを通知する通知情報を、前記第1グループに属する他の全ての情報処理装置へ送信する第1通知手段、 前記第1通知手段が通知情報を送信した情報処理装置から前記共有データの取得要求を受信する毎に、前記記憶手段に記憶されている前記共有データを、前記取得要求送信元の情報処理装置へ送信する第1送信手段、 前記第1グループに属する特定の情報処理装置から、該特定の情報処理装置が保持している前記共有データが更新されたことを通知する通知情報を受信した場合に、ランダムに決定した待機時間、或いは前記第1グループに属する個々の情報装置装置毎に相違するように予め設定された待機時間が経過した後に、前記共有データの取得要求を前記特定の情報処理装置へ送信する取得要求手段、 及び、前記取得要求手段が送信した取得要求に基づいて前記特定の情報処理装置から送信された前記共有データを受信し、受信した前記共有データを前記記憶手段に記憶させる受信手段 として機能させるデータ同期プログラム。

US8190610B2
CLAIM 33
. A map-reduce method of processing data (コンピュ) from a plurality of groups having different schema (前記第2グループ, 要求手段) over a computer system , the method comprising : for a first data set having a plurality of first key-value pairs , wherein such first data set belongs to a first data group and the first key-value pairs have a first schema , partitioning the first data set into a plurality of data partitions and providing each data partition to a selected one of a plurality of mapping functions that are each user-configurable to independently output a plurality of lists of values for each of a set of keys found in such map function' ;

s corresponding data partition to form a first intermediate data set having a first set of resulting key-value pairs ;

for a second data set having a plurality of second key-value pairs , wherein such second data set belongs to a second data group and the second key-value pairs have a second schema , partitioning the first data set into a plurality of data partitions and providing each data partition to a selected one of a plurality of mapping functions to form a second intermediate data set having a second set of resulting key-value pairs differing from the first set of resulting key-value pairs , wherein the first schema differs from the second schema , wherein the first and second schema and the first and second set of resulting key-value pairs have a key in common ;

and reducing the first and the second intermediate data set together so as to form an output data set , wherein the reducing is accomplished by iterating on at least one key from the first and second intermediate data set , and the output data set has a different schema than the first and second schema , wherein the mapping and reducing operations are performed by a distributed system .
JP2006195890A
CLAIM 4
自装置と同一の第1グループに属する他の複数台の情報処理装置と通信回線を介して接続された情報処理装置であって、 前記第1グループに属する各情報処理装置が各々保持すべきでかつ互いに同期させるべき共有データを記憶可能な記憶手段と、 前記第1グループに属する特定の情報処理装置から、該特定の情報処理装置が保持している前記共有データが更新されたことを通知する通知情報を受信した場合に、ランダムに決定した待機時間、或いは前記第1グループに属する個々の情報装置装置毎に相違するように予め設定された待機時間が経過した後に、前記共有データの取得要求を前記特定の情報処理装置へ送信する取得要求手段 (different schema) と、 前記取得要求手段が送信した取得要求に基づいて前記特定の情報処理装置から送信された前記共有データを受信し、受信した前記共有データを前記記憶手段に記憶させる受信手段と、 を含む情報処理装置。

JP2006195890A
CLAIM 6
前記情報処理装置は、前記第1グループを包含する第2グループにも属していると共に、前記第1グループに属さず前記第2グループ (different schema) に属する他の複数の情報処理装置とも通信回線を介して接続されており、 前記受信手段が前記第2グループに属する各情報処理装置が各々保持すべきでかつ互いに同期させるべき共有データを前記特定の情報処理装置から受信して前記記憶手段に記憶させた後に、前記共有データが更新されたことを通知する通知情報を、前記第1グループに属さず前記第2グループに属する他の全ての情報処理装置へ送信する第2通知手段と、 前記第2通知手段が通知情報を送信した情報処理装置から前記共有データの取得要求を受信する毎に、前記記憶手段に記憶されている前記共有データを、前記取得要求送信元の情報処理装置へ送信する第2送信手段と、 を更に備えたことを特徴とする請求項4記載の情報処理装置。

JP2006195890A
CLAIM 10
自装置と同一の第1グループに属する他の複数台の情報処理装置と通信回線を介して接続され、前記第1グループに属する各情報処理装置が各々保持すべきでかつ互いに同期させるべき共有データを記憶可能な記憶手段を備えたコンピュ (processing data) ータを、 前記記憶手段に記憶されている前記共有データが更新された場合に、前記共有データが更新されたことを通知する通知情報を、前記第1グループに属する他の全ての情報処理装置へ送信する第1通知手段、 前記第1通知手段が通知情報を送信した情報処理装置から前記共有データの取得要求を受信する毎に、前記記憶手段に記憶されている前記共有データを、前記取得要求送信元の情報処理装置へ送信する第1送信手段、 前記第1グループに属する特定の情報処理装置から、該特定の情報処理装置が保持している前記共有データが更新されたことを通知する通知情報を受信した場合に、ランダムに決定した待機時間、或いは前記第1グループに属する個々の情報装置装置毎に相違するように予め設定された待機時間が経過した後に、前記共有データの取得要求を前記特定の情報処理装置へ送信する取得要求手段、 及び、前記取得要求手段が送信した取得要求に基づいて前記特定の情報処理装置から送信された前記共有データを受信し、受信した前記共有データを前記記憶手段に記憶させる受信手段 として機能させるデータ同期プログラム。

US8190610B2
CLAIM 40
. A computer system including a plurality of computing devices , the computer system configured to process data of a data set , wherein the data set comprises a plurality of data groups , the computer system comprises at least one processor and memory that are operable to perform the following operations : for a first data set having a plurality of first key-value pairs , wherein such first data set belongs to a first data group and the first key-value pairs have a first schema , partitioning the first data set into a plurality of data partitions and providing each data partition to a selected one of a plurality of mapping functions that are each user-configurable to independently output a plurality of lists of values for each of a set of keys found in such map function' ;

s corresponding data partition to form a first intermediate data set having a first set of resulting key-value pairs ;

for a second data set having a plurality of second key-value pairs , wherein such second data set belongs to a second data group and the second key-value pairs have a second schema , partitioning the first data set into a plurality of data partitions and providing each data partition to a selected one of a plurality of mapping functions to form a second intermediate data set having a second set of resulting key-value pairs differing from the first set of resulting key-value pairs , wherein the first schema differs from the second schema ;

and reducing the first and the second intermediate data set together so as to form an output data set , wherein the reducing is accomplished by iterating on at least one key from the first and second intermediate data set , and the output data set has a different schema (前記第2グループ, 要求手段) than the first and second schema , wherein the first and second schema and the first and second set of resulting key-value pairs have a key in common .
JP2006195890A
CLAIM 4
自装置と同一の第1グループに属する他の複数台の情報処理装置と通信回線を介して接続された情報処理装置であって、 前記第1グループに属する各情報処理装置が各々保持すべきでかつ互いに同期させるべき共有データを記憶可能な記憶手段と、 前記第1グループに属する特定の情報処理装置から、該特定の情報処理装置が保持している前記共有データが更新されたことを通知する通知情報を受信した場合に、ランダムに決定した待機時間、或いは前記第1グループに属する個々の情報装置装置毎に相違するように予め設定された待機時間が経過した後に、前記共有データの取得要求を前記特定の情報処理装置へ送信する取得要求手段 (different schema) と、 前記取得要求手段が送信した取得要求に基づいて前記特定の情報処理装置から送信された前記共有データを受信し、受信した前記共有データを前記記憶手段に記憶させる受信手段と、 を含む情報処理装置。

JP2006195890A
CLAIM 6
前記情報処理装置は、前記第1グループを包含する第2グループにも属していると共に、前記第1グループに属さず前記第2グループ (different schema) に属する他の複数の情報処理装置とも通信回線を介して接続されており、 前記受信手段が前記第2グループに属する各情報処理装置が各々保持すべきでかつ互いに同期させるべき共有データを前記特定の情報処理装置から受信して前記記憶手段に記憶させた後に、前記共有データが更新されたことを通知する通知情報を、前記第1グループに属さず前記第2グループに属する他の全ての情報処理装置へ送信する第2通知手段と、 前記第2通知手段が通知情報を送信した情報処理装置から前記共有データの取得要求を受信する毎に、前記記憶手段に記憶されている前記共有データを、前記取得要求送信元の情報処理装置へ送信する第2送信手段と、 を更に備えたことを特徴とする請求項4記載の情報処理装置。




US8190610B2

Filed: 2006-10-05     Issued: 2012-05-29

MapReduce for distributed database processing

(Original Assignee) Yahoo Inc     (Current Assignee) R2 Solutions LLC ; Altaba Inc

Ali Dasdan, Hung-Chih Yang, Ruey-Lung Hsiao
JP2005235171A

Filed: 2004-12-08     Issued: 2005-09-02

時間的に近接して記憶システムに書き込まれたデータユニットを示すコンテンツアドレスの生成方法およびその装置

(Original Assignee) Emc Corp; イーエムシー コーポレイションEmc Corporation     

Carl D'halluin, Michael Kilian, Jan Van Riel, Tom Teugels, Stephen Todd, カール、ダルイン, トッド スティーブン, トゥーゲルス トム, キリアン マイケル, バン リール ヤン
US8190610B2
CLAIM 1
. A method of processing data of a data set over a distributed system , wherein the data set comprises a plurality of data groups , the method comprising : partitioning the data of each one of the data groups into a plurality of data partitions that each have a plurality of key-value pairs and providing each data partition to a selected one of a plurality of mapping functions that are each user-configurable to independently output a plurality of lists of values for each of a set of keys found in such map function' ;

s corresponding data partition to form corresponding intermediate data for that data group (えること) and identifiable to that data group , wherein the data of a first data group has a different schema than the data of a second data group and the data of the first data group is mapped differently than the data of the second data group so that different lists of values are output for the corresponding different intermediate data , wherein the different schema and corresponding different intermediate data have a key in common ;

and reducing the intermediate data for the data groups to at least one output data group , including processing the intermediate data for each data group in a manner that is defined to correspond to that data group , so as to result in a merging of the corresponding different intermediate data based on the key in common , wherein the mapping and reducing operations are performed by a distributed system .
JP2005235171A
CLAIM 2
前記方法が、 (b)1つのデータユニットのコンテンツアドレスを前記記憶システムに与えること (data group, first data group) によって、前記データユニットの1つにアクセスする行為と、 (c)前記記憶システム内のどこに前記1つのデータユニットを記憶すべきかを算出する時に、前記第1の情報を1つのデータユニットのコンテンツアドレスと考える行為と、をさらに含む請求項1記載の少なくとも1つのコンピュータ可読媒体。

US8190610B2
CLAIM 2
. The method of claim 1 , wherein : the at least one output data group (えること) is a plurality of output data groups .
JP2005235171A
CLAIM 2
前記方法が、 (b)1つのデータユニットのコンテンツアドレスを前記記憶システムに与えること (data group, first data group) によって、前記データユニットの1つにアクセスする行為と、 (c)前記記憶システム内のどこに前記1つのデータユニットを記憶すべきかを算出する時に、前記第1の情報を1つのデータユニットのコンテンツアドレスと考える行為と、をさらに含む請求項1記載の少なくとも1つのコンピュータ可読媒体。

US8190610B2
CLAIM 3
. The method of claim 1 , wherein : corresponding intermediate data for a data group (えること) being identifiable to the data group includes the corresponding intermediate data being stored such that metadata associated with a manner of storage is identifiable to the data group .
JP2005235171A
CLAIM 2
前記方法が、 (b)1つのデータユニットのコンテンツアドレスを前記記憶システムに与えること (data group, first data group) によって、前記データユニットの1つにアクセスする行為と、 (c)前記記憶システム内のどこに前記1つのデータユニットを記憶すべきかを算出する時に、前記第1の情報を1つのデータユニットのコンテンツアドレスと考える行為と、をさらに含む請求項1記載の少なくとも1つのコンピュータ可読媒体。

US8190610B2
CLAIM 4
. The method of claim 1 , wherein : corresponding intermediate data for a data group (えること) being identifiable to the data group includes the corresponding intermediate data being stored in association with data that is identifiable to the data group .
JP2005235171A
CLAIM 2
前記方法が、 (b)1つのデータユニットのコンテンツアドレスを前記記憶システムに与えること (data group, first data group) によって、前記データユニットの1つにアクセスする行為と、 (c)前記記憶システム内のどこに前記1つのデータユニットを記憶すべきかを算出する時に、前記第1の情報を1つのデータユニットのコンテンツアドレスと考える行為と、をさらに含む請求項1記載の少なくとも1つのコンピュータ可読媒体。

US8190610B2
CLAIM 5
. The method of claim 1 , wherein : processing the intermediate data for each data group (えること) in a manner that is defined to correspond to that data group includes , for each data group , employing an iterator that corresponds to that data group .
JP2005235171A
CLAIM 2
前記方法が、 (b)1つのデータユニットのコンテンツアドレスを前記記憶システムに与えること (data group, first data group) によって、前記データユニットの1つにアクセスする行為と、 (c)前記記憶システム内のどこに前記1つのデータユニットを記憶すべきかを算出する時に、前記第1の情報を1つのデータユニットのコンテンツアドレスと考える行為と、をさらに含む請求項1記載の少なくとも1つのコンピュータ可読媒体。

US8190610B2
CLAIM 11
. The method of claim 10 , wherein : processing the intermediate data for each data group (えること) in a manner that is defined to correspond to that data group includes , for each data group , employing an iterator that corresponds to that data group , wherein the iterator includes providing the associated metadata to the processing of the reducing step .
JP2005235171A
CLAIM 2
前記方法が、 (b)1つのデータユニットのコンテンツアドレスを前記記憶システムに与えること (data group, first data group) によって、前記データユニットの1つにアクセスする行為と、 (c)前記記憶システム内のどこに前記1つのデータユニットを記憶すべきかを算出する時に、前記第1の情報を1つのデータユニットのコンテンツアドレスと考える行為と、をさらに含む請求項1記載の少なくとも1つのコンピュータ可読媒体。

US8190610B2
CLAIM 12
. The method of claim 5 , wherein : the intermediate data includes a plurality of grouped sets of key/value pairs ;

the reducing step is carried out by a plurality of reducers ;

for at least one of the reducers , the iterator corresponding to a particular data group (えること) , for that reducer , operates according to a different key of a different schema than the iterator corresponding to another particular data group , for that reducer .
JP2005235171A
CLAIM 2
前記方法が、 (b)1つのデータユニットのコンテンツアドレスを前記記憶システムに与えること (data group, first data group) によって、前記データユニットの1つにアクセスする行為と、 (c)前記記憶システム内のどこに前記1つのデータユニットを記憶すべきかを算出する時に、前記第1の情報を1つのデータユニットのコンテンツアドレスと考える行為と、をさらに含む請求項1記載の少なくとも1つのコンピュータ可読媒体。

US8190610B2
CLAIM 13
. The method of claim 1 , wherein : the intermediate data processing step of the reducing step further comprises processing data that is not intermediate data (含むコンピュータ) .
JP2005235171A
CLAIM 6
コンピュータシステムで実行される時に、データを処理する方法を実行する命令で符号化された少なくとも1つのコンピュータ可読媒体であって、前記コンピュータシステムが、少なくとも1つのホストと、前記少なくとも1つのホストのデータユニットを記憶する少なくとも1つの連想記憶装置とを有し、前記少なくとも1つのホストが、データユニットのコンテンツに少なくとも一部分基づいたコンテンツアドレスを使用してデータユニットにアクセスし、前記少なくとも1つの記憶システムが、データユニットが記憶されている少なくとも1つの記憶システム内の記憶場所にデータユニットのコンテンツアドレスをマッピングするインデックスを有し、前記方法が、 (a)前記少なくとも1つのホストから、データユニットの1つを記憶するという要求であって、データユニットに関連付けられたコンテンツアドレスでデータユニットの1つを識別する要求を受け取る行為と、 (b)記憶位置の前記インデックスへのエントリが、前記データユニットの1つと時間的に近接して少なくとも1つの記憶システムに書き込まれた他のデータユニットのインデックスへのエントリに近接するように、前記データユニットの1つを選択された記憶位置に記憶する行為と、 を含むコンピュータ (not intermediate data) 可読媒体。

US8190610B2
CLAIM 14
. The method of claim 13 , wherein : the reducing step is carried out by a plurality of reducers ;

and the data that is not intermediate data (含むコンピュータ) , for a particular reducer , includes data that is associated with another reducer .
JP2005235171A
CLAIM 6
コンピュータシステムで実行される時に、データを処理する方法を実行する命令で符号化された少なくとも1つのコンピュータ可読媒体であって、前記コンピュータシステムが、少なくとも1つのホストと、前記少なくとも1つのホストのデータユニットを記憶する少なくとも1つの連想記憶装置とを有し、前記少なくとも1つのホストが、データユニットのコンテンツに少なくとも一部分基づいたコンテンツアドレスを使用してデータユニットにアクセスし、前記少なくとも1つの記憶システムが、データユニットが記憶されている少なくとも1つの記憶システム内の記憶場所にデータユニットのコンテンツアドレスをマッピングするインデックスを有し、前記方法が、 (a)前記少なくとも1つのホストから、データユニットの1つを記憶するという要求であって、データユニットに関連付けられたコンテンツアドレスでデータユニットの1つを識別する要求を受け取る行為と、 (b)記憶位置の前記インデックスへのエントリが、前記データユニットの1つと時間的に近接して少なくとも1つの記憶システムに書き込まれた他のデータユニットのインデックスへのエントリに近接するように、前記データユニットの1つを選択された記憶位置に記憶する行為と、 を含むコンピュータ (not intermediate data) 可読媒体。

US8190610B2
CLAIM 15
. The method of claim 13 , wherein : the reducing step is carried out by a plurality of reducers ;

and the data that is not intermediate data (含むコンピュータ) , for a particular reducer , includes data that is associated with that reducer .
JP2005235171A
CLAIM 6
コンピュータシステムで実行される時に、データを処理する方法を実行する命令で符号化された少なくとも1つのコンピュータ可読媒体であって、前記コンピュータシステムが、少なくとも1つのホストと、前記少なくとも1つのホストのデータユニットを記憶する少なくとも1つの連想記憶装置とを有し、前記少なくとも1つのホストが、データユニットのコンテンツに少なくとも一部分基づいたコンテンツアドレスを使用してデータユニットにアクセスし、前記少なくとも1つの記憶システムが、データユニットが記憶されている少なくとも1つの記憶システム内の記憶場所にデータユニットのコンテンツアドレスをマッピングするインデックスを有し、前記方法が、 (a)前記少なくとも1つのホストから、データユニットの1つを記憶するという要求であって、データユニットに関連付けられたコンテンツアドレスでデータユニットの1つを識別する要求を受け取る行為と、 (b)記憶位置の前記インデックスへのエントリが、前記データユニットの1つと時間的に近接して少なくとも1つの記憶システムに書き込まれた他のデータユニットのインデックスへのエントリに近接するように、前記データユニットの1つを選択された記憶位置に記憶する行為と、 を含むコンピュータ (not intermediate data) 可読媒体。

US8190610B2
CLAIM 17
. A computer system including a plurality of computing devices , the computer system configured to process data of a data set , wherein the data set comprises a plurality of data groups , the computer system comprises at least one processor and memory that are operable to perform the following operations : partitioning the data of each one of the data groups into a plurality of data partitions that each have a plurality of key-value pairs and providing each data partition to a selected one of a plurality of mapping functions that are each user-configurable to independently output a plurality of lists of values for each of a set of keys found in such map function' ;

s corresponding data partition to form corresponding intermediate data for that data group (えること) and identifiable to that data group , wherein the data of a first data group has a different schema than the data of a second data group and the data of the first data group is mapped differently than the data of the second data group so that different lists of values are output for the corresponding different intermediate data , wherein the different schema and corresponding different intermediate data have a key in common ;

and reduce the intermediate data for the data groups to at least one output data group , including processing the intermediate data for each data group in a manner that is defined to correspond to that data group so as to result in a merging of the corresponding different intermediate data based on the key in common .
JP2005235171A
CLAIM 2
前記方法が、 (b)1つのデータユニットのコンテンツアドレスを前記記憶システムに与えること (data group, first data group) によって、前記データユニットの1つにアクセスする行為と、 (c)前記記憶システム内のどこに前記1つのデータユニットを記憶すべきかを算出する時に、前記第1の情報を1つのデータユニットのコンテンツアドレスと考える行為と、をさらに含む請求項1記載の少なくとも1つのコンピュータ可読媒体。

US8190610B2
CLAIM 18
. The computer system of claim 17 , wherein : the at least one output data group (えること) is a plurality of output data groups .
JP2005235171A
CLAIM 2
前記方法が、 (b)1つのデータユニットのコンテンツアドレスを前記記憶システムに与えること (data group, first data group) によって、前記データユニットの1つにアクセスする行為と、 (c)前記記憶システム内のどこに前記1つのデータユニットを記憶すべきかを算出する時に、前記第1の情報を1つのデータユニットのコンテンツアドレスと考える行為と、をさらに含む請求項1記載の少なくとも1つのコンピュータ可読媒体。

US8190610B2
CLAIM 19
. The computer system of claim 17 , wherein : corresponding intermediate data for a data group (えること) being identifiable to the data group includes the corresponding intermediate data being stored such that metadata associated with the manner of storage is identifiable to the data group .
JP2005235171A
CLAIM 2
前記方法が、 (b)1つのデータユニットのコンテンツアドレスを前記記憶システムに与えること (data group, first data group) によって、前記データユニットの1つにアクセスする行為と、 (c)前記記憶システム内のどこに前記1つのデータユニットを記憶すべきかを算出する時に、前記第1の情報を1つのデータユニットのコンテンツアドレスと考える行為と、をさらに含む請求項1記載の少なくとも1つのコンピュータ可読媒体。

US8190610B2
CLAIM 20
. The computer system of claim 19 , wherein : corresponding intermediate data for a data group (えること) being identifiable to the data group includes the computer system being configured to store the corresponding intermediate data in association with data that is identifiable to the data group .
JP2005235171A
CLAIM 2
前記方法が、 (b)1つのデータユニットのコンテンツアドレスを前記記憶システムに与えること (data group, first data group) によって、前記データユニットの1つにアクセスする行為と、 (c)前記記憶システム内のどこに前記1つのデータユニットを記憶すべきかを算出する時に、前記第1の情報を1つのデータユニットのコンテンツアドレスと考える行為と、をさらに含む請求項1記載の少なくとも1つのコンピュータ可読媒体。

US8190610B2
CLAIM 21
. The computer system of claim 17 , wherein : the at least one processor and memory being configured to process the intermediate data for each data group (えること) in a manner that is defined to correspond to that data group includes , for each data group , the at least one processor and memory being configured to employ an iterator that corresponds to that data group .
JP2005235171A
CLAIM 2
前記方法が、 (b)1つのデータユニットのコンテンツアドレスを前記記憶システムに与えること (data group, first data group) によって、前記データユニットの1つにアクセスする行為と、 (c)前記記憶システム内のどこに前記1つのデータユニットを記憶すべきかを算出する時に、前記第1の情報を1つのデータユニットのコンテンツアドレスと考える行為と、をさらに含む請求項1記載の少なくとも1つのコンピュータ可読媒体。

US8190610B2
CLAIM 22
. The computer system of claim 21 , wherein : the intermediate data includes a plurality of grouped sets of key/value pairs ;

the at least one processor and memory are further operable to reduce the intermediate data via a plurality of reducers ;

for at least one of the reducers , the iterator corresponding to a particular data group (えること) , for that reducer , is configured to operate according to a different key of a different schema than the iterator corresponding to another particular data group , for that reducer .
JP2005235171A
CLAIM 2
前記方法が、 (b)1つのデータユニットのコンテンツアドレスを前記記憶システムに与えること (data group, first data group) によって、前記データユニットの1つにアクセスする行為と、 (c)前記記憶システム内のどこに前記1つのデータユニットを記憶すべきかを算出する時に、前記第1の情報を1つのデータユニットのコンテンツアドレスと考える行為と、をさらに含む請求項1記載の少なくとも1つのコンピュータ可読媒体。

US8190610B2
CLAIM 28
. The computer system of claim 27 , wherein : the at least one processor and memory being further operable to process the intermediate data for each data group (えること) in a manner that is defined to correspond to that data group includes , for each data group , the at least one processor and memory being further operable to employ an iterator that corresponds to that data group , wherein the iterator is configured to provide the associated metadata to the reducing .
JP2005235171A
CLAIM 2
前記方法が、 (b)1つのデータユニットのコンテンツアドレスを前記記憶システムに与えること (data group, first data group) によって、前記データユニットの1つにアクセスする行為と、 (c)前記記憶システム内のどこに前記1つのデータユニットを記憶すべきかを算出する時に、前記第1の情報を1つのデータユニットのコンテンツアドレスと考える行為と、をさらに含む請求項1記載の少なくとも1つのコンピュータ可読媒体。

US8190610B2
CLAIM 29
. The computer system of claim 17 , wherein : the intermediate data processing of the reducing further comprises processing data that is not intermediate data (含むコンピュータ) .
JP2005235171A
CLAIM 6
コンピュータシステムで実行される時に、データを処理する方法を実行する命令で符号化された少なくとも1つのコンピュータ可読媒体であって、前記コンピュータシステムが、少なくとも1つのホストと、前記少なくとも1つのホストのデータユニットを記憶する少なくとも1つの連想記憶装置とを有し、前記少なくとも1つのホストが、データユニットのコンテンツに少なくとも一部分基づいたコンテンツアドレスを使用してデータユニットにアクセスし、前記少なくとも1つの記憶システムが、データユニットが記憶されている少なくとも1つの記憶システム内の記憶場所にデータユニットのコンテンツアドレスをマッピングするインデックスを有し、前記方法が、 (a)前記少なくとも1つのホストから、データユニットの1つを記憶するという要求であって、データユニットに関連付けられたコンテンツアドレスでデータユニットの1つを識別する要求を受け取る行為と、 (b)記憶位置の前記インデックスへのエントリが、前記データユニットの1つと時間的に近接して少なくとも1つの記憶システムに書き込まれた他のデータユニットのインデックスへのエントリに近接するように、前記データユニットの1つを選択された記憶位置に記憶する行為と、 を含むコンピュータ (not intermediate data) 可読媒体。

US8190610B2
CLAIM 30
. The computer system of claim 29 , wherein : the at least one processor and memory are further operable to reduce the intermediate data via a plurality of reducers ;

and the data that is not intermediate data (含むコンピュータ) , for a particular reducer , includes data that is associated with another reducer .
JP2005235171A
CLAIM 6
コンピュータシステムで実行される時に、データを処理する方法を実行する命令で符号化された少なくとも1つのコンピュータ可読媒体であって、前記コンピュータシステムが、少なくとも1つのホストと、前記少なくとも1つのホストのデータユニットを記憶する少なくとも1つの連想記憶装置とを有し、前記少なくとも1つのホストが、データユニットのコンテンツに少なくとも一部分基づいたコンテンツアドレスを使用してデータユニットにアクセスし、前記少なくとも1つの記憶システムが、データユニットが記憶されている少なくとも1つの記憶システム内の記憶場所にデータユニットのコンテンツアドレスをマッピングするインデックスを有し、前記方法が、 (a)前記少なくとも1つのホストから、データユニットの1つを記憶するという要求であって、データユニットに関連付けられたコンテンツアドレスでデータユニットの1つを識別する要求を受け取る行為と、 (b)記憶位置の前記インデックスへのエントリが、前記データユニットの1つと時間的に近接して少なくとも1つの記憶システムに書き込まれた他のデータユニットのインデックスへのエントリに近接するように、前記データユニットの1つを選択された記憶位置に記憶する行為と、 を含むコンピュータ (not intermediate data) 可読媒体。

US8190610B2
CLAIM 31
. The computer system of claim 29 , wherein : the at least one processor and memory are further operable to reduce the intermediate data via a plurality of reducers ;

and the data that is not intermediate data (含むコンピュータ) , for a particular reducer , includes data that is associated with that reducer .
JP2005235171A
CLAIM 6
コンピュータシステムで実行される時に、データを処理する方法を実行する命令で符号化された少なくとも1つのコンピュータ可読媒体であって、前記コンピュータシステムが、少なくとも1つのホストと、前記少なくとも1つのホストのデータユニットを記憶する少なくとも1つの連想記憶装置とを有し、前記少なくとも1つのホストが、データユニットのコンテンツに少なくとも一部分基づいたコンテンツアドレスを使用してデータユニットにアクセスし、前記少なくとも1つの記憶システムが、データユニットが記憶されている少なくとも1つの記憶システム内の記憶場所にデータユニットのコンテンツアドレスをマッピングするインデックスを有し、前記方法が、 (a)前記少なくとも1つのホストから、データユニットの1つを記憶するという要求であって、データユニットに関連付けられたコンテンツアドレスでデータユニットの1つを識別する要求を受け取る行為と、 (b)記憶位置の前記インデックスへのエントリが、前記データユニットの1つと時間的に近接して少なくとも1つの記憶システムに書き込まれた他のデータユニットのインデックスへのエントリに近接するように、前記データユニットの1つを選択された記憶位置に記憶する行為と、 を含むコンピュータ (not intermediate data) 可読媒体。

US8190610B2
CLAIM 33
. A map-reduce method of processing data from a plurality of groups having different schema over a computer system , the method comprising : for a first data set having a plurality of first key-value pairs , wherein such first data set belongs to a first data group (えること) and the first key-value pairs have a first schema , partitioning the first data set into a plurality of data partitions and providing each data partition to a selected one of a plurality of mapping functions that are each user-configurable to independently output a plurality of lists of values for each of a set of keys found in such map function' ;

s corresponding data partition to form a first intermediate data set having a first set of resulting key-value pairs ;

for a second data set having a plurality of second key-value pairs , wherein such second data set belongs to a second data group and the second key-value pairs have a second schema , partitioning the first data set into a plurality of data partitions and providing each data partition to a selected one of a plurality of mapping functions to form a second intermediate data set having a second set of resulting key-value pairs differing from the first set of resulting key-value pairs , wherein the first schema differs from the second schema , wherein the first and second schema and the first and second set of resulting key-value pairs have a key in common ;

and reducing the first and the second intermediate data set together so as to form an output data set , wherein the reducing is accomplished by iterating on at least one key from the first and second intermediate data set , and the output data set has a different schema than the first and second schema , wherein the mapping and reducing operations are performed by a distributed system .
JP2005235171A
CLAIM 2
前記方法が、 (b)1つのデータユニットのコンテンツアドレスを前記記憶システムに与えること (data group, first data group) によって、前記データユニットの1つにアクセスする行為と、 (c)前記記憶システム内のどこに前記1つのデータユニットを記憶すべきかを算出する時に、前記第1の情報を1つのデータユニットのコンテンツアドレスと考える行為と、をさらに含む請求項1記載の少なくとも1つのコンピュータ可読媒体。

US8190610B2
CLAIM 40
. A computer system including a plurality of computing devices , the computer system configured to process data of a data set , wherein the data set comprises a plurality of data groups , the computer system comprises at least one processor and memory that are operable to perform the following operations : for a first data set having a plurality of first key-value pairs , wherein such first data set belongs to a first data group (えること) and the first key-value pairs have a first schema , partitioning the first data set into a plurality of data partitions and providing each data partition to a selected one of a plurality of mapping functions that are each user-configurable to independently output a plurality of lists of values for each of a set of keys found in such map function' ;

s corresponding data partition to form a first intermediate data set having a first set of resulting key-value pairs ;

for a second data set having a plurality of second key-value pairs , wherein such second data set belongs to a second data group and the second key-value pairs have a second schema , partitioning the first data set into a plurality of data partitions and providing each data partition to a selected one of a plurality of mapping functions to form a second intermediate data set having a second set of resulting key-value pairs differing from the first set of resulting key-value pairs , wherein the first schema differs from the second schema ;

and reducing the first and the second intermediate data set together so as to form an output data set , wherein the reducing is accomplished by iterating on at least one key from the first and second intermediate data set , and the output data set has a different schema than the first and second schema , wherein the first and second schema and the first and second set of resulting key-value pairs have a key in common .
JP2005235171A
CLAIM 2
前記方法が、 (b)1つのデータユニットのコンテンツアドレスを前記記憶システムに与えること (data group, first data group) によって、前記データユニットの1つにアクセスする行為と、 (c)前記記憶システム内のどこに前記1つのデータユニットを記憶すべきかを算出する時に、前記第1の情報を1つのデータユニットのコンテンツアドレスと考える行為と、をさらに含む請求項1記載の少なくとも1つのコンピュータ可読媒体。




US8190610B2

Filed: 2006-10-05     Issued: 2012-05-29

MapReduce for distributed database processing

(Original Assignee) Yahoo Inc     (Current Assignee) R2 Solutions LLC ; Altaba Inc

Ali Dasdan, Hung-Chih Yang, Ruey-Lung Hsiao
JP2006041764A

Filed: 2004-07-23     Issued: 2006-02-09

ログ記録装置、ログ記録プログラムおよび記録媒体

(Original Assignee) Ricoh Co Ltd; 株式会社リコー     

Fumihiro Umetsu, 史浩 梅津
US8190610B2
CLAIM 1
. A method of processing data (コンピュ) of a data set (アップ) over a distributed system , wherein the data set comprises a plurality of data groups , the method comprising : partitioning the data of each one of the data groups into a plurality of data partitions that each have a plurality of key-value pairs and providing each data partition to a selected one of a plurality of mapping functions that are each user-configurable to independently output a plurality of lists of values for each of a set of keys found in such map function' ;

s corresponding data partition to form corresponding intermediate data for that data group and identifiable to that data group , wherein the data of a first data group has a different schema than the data of a second data group and the data of the first data group is mapped differently than the data of the second data group so that different lists of values are output for the corresponding different intermediate data (のプログラム) , wherein the different schema and corresponding different intermediate data have a key in common ;

and reducing the intermediate data for the data groups to at least one output data group , including processing the intermediate data for each data group in a manner that is defined to correspond to that data group , so as to result in a merging of the corresponding different intermediate data based on the key in common , wherein the mapping and reducing operations are performed by a distributed system .
JP2006041764A
CLAIM 7
前記制御手段は、画像機器へ新規ジョブ投入停止の要求を出した後、ログファイルをバックアップ (first set, second set, data set, first data set, second data set, second intermediate data set, output data set, intermediate data set) するように制御することを特徴とする請求項3記載のログ記録装置。

JP2006041764A
CLAIM 8
コンピュ (processing data) ータを、請求項1乃至7のいずれか1項に記載のログ記録装置として機能させるためのプログラム (corresponding different intermediate data)

US8190610B2
CLAIM 13
. The method of claim 1 , wherein : the intermediate data processing step of the reducing step further comprises processing data (コンピュ) that is not intermediate data .
JP2006041764A
CLAIM 8
コンピュ (processing data) ータを、請求項1乃至7のいずれか1項に記載のログ記録装置として機能させるためのプログラム。

US8190610B2
CLAIM 17
. A computer system including a plurality of computing devices , the computer system configured to process data of a data set (アップ) , wherein the data set comprises a plurality of data groups , the computer system comprises at least one processor and memory that are operable to perform the following operations : partitioning the data of each one of the data groups into a plurality of data partitions that each have a plurality of key-value pairs and providing each data partition to a selected one of a plurality of mapping functions that are each user-configurable to independently output a plurality of lists of values for each of a set of keys found in such map function' ;

s corresponding data partition to form corresponding intermediate data for that data group and identifiable to that data group , wherein the data of a first data group has a different schema than the data of a second data group and the data of the first data group is mapped differently than the data of the second data group so that different lists of values are output for the corresponding different intermediate data (のプログラム) , wherein the different schema and corresponding different intermediate data have a key in common ;

and reduce the intermediate data for the data groups to at least one output data group , including processing the intermediate data for each data group in a manner that is defined to correspond to that data group so as to result in a merging of the corresponding different intermediate data based on the key in common .
JP2006041764A
CLAIM 7
前記制御手段は、画像機器へ新規ジョブ投入停止の要求を出した後、ログファイルをバックアップ (first set, second set, data set, first data set, second data set, second intermediate data set, output data set, intermediate data set) するように制御することを特徴とする請求項3記載のログ記録装置。

JP2006041764A
CLAIM 8
コンピュータを、請求項1乃至7のいずれか1項に記載のログ記録装置として機能させるためのプログラム (corresponding different intermediate data)

US8190610B2
CLAIM 29
. The computer system of claim 17 , wherein : the intermediate data processing of the reducing further comprises processing data (コンピュ) that is not intermediate data .
JP2006041764A
CLAIM 8
コンピュ (processing data) ータを、請求項1乃至7のいずれか1項に記載のログ記録装置として機能させるためのプログラム。

US8190610B2
CLAIM 33
. A map-reduce method of processing data (コンピュ) from a plurality of groups having different schema over a computer system , the method comprising : for a first data set (アップ) having a plurality of first key-value pairs , wherein such first data set belongs to a first data group and the first key-value pairs have a first schema , partitioning the first data set into a plurality of data partitions and providing each data partition to a selected one of a plurality of mapping functions that are each user-configurable to independently output a plurality of lists of values for each of a set of keys found in such map function' ;

s corresponding data partition to form a first intermediate data set having a first set (アップ) of resulting key-value pairs ;

for a second data set having a plurality of second key-value pairs , wherein such second data set belongs to a second data group and the second key-value pairs have a second schema , partitioning the first data set into a plurality of data partitions and providing each data partition to a selected one of a plurality of mapping functions to form a second intermediate data (ファイル) set having a second set (アップ) of resulting key-value pairs differing from the first set of resulting key-value pairs , wherein the first schema differs from the second schema , wherein the first and second schema and the first and second set of resulting key-value pairs have a key in common ;

and reducing the first and the second intermediate data set together so as to form an output data set , wherein the reducing is accomplished by iterating on at least one key from the first and second intermediate data set , and the output data set has a different schema than the first and second schema , wherein the mapping and reducing operations are performed by a distributed system .
JP2006041764A
CLAIM 7
前記制御手段は、画像機器へ新規ジョブ投入停止の要求を出した後、ログファイル (second intermediate data) をバックアップ (first set, second set, data set, first data set, second data set, second intermediate data set, output data set, intermediate data set) するように制御することを特徴とする請求項3記載のログ記録装置。

JP2006041764A
CLAIM 8
コンピュ (processing data) ータを、請求項1乃至7のいずれか1項に記載のログ記録装置として機能させるためのプログラム。

US8190610B2
CLAIM 34
. The map-reduce method of claim 33 , wherein the reducing is accomplished by iterating on at least one key from each of the first and second intermediate data (ファイル) set so that the output data set (アップ) is a merging of a portion of the first and second intermediate data set .
JP2006041764A
CLAIM 7
前記制御手段は、画像機器へ新規ジョブ投入停止の要求を出した後、ログファイル (second intermediate data) をバックアップ (first set, second set, data set, first data set, second data set, second intermediate data set, output data set, intermediate data set) するように制御することを特徴とする請求項3記載のログ記録装置。

US8190610B2
CLAIM 35
. The map-reduce method of claim 33 , wherein the reducing is accomplished by a plurality of reducers , the method further comprises : partitioning the first or second intermediate data (ファイル) set into a plurality of partitions , at least some of the key-value pairs of the first or second intermediate data being provided to more than one of the partitions ;

and providing the partitioned intermediate data set (アップ) of each partition to a separate one of the reducers .
JP2006041764A
CLAIM 7
前記制御手段は、画像機器へ新規ジョブ投入停止の要求を出した後、ログファイル (second intermediate data) をバックアップ (first set, second set, data set, first data set, second data set, second intermediate data set, output data set, intermediate data set) するように制御することを特徴とする請求項3記載のログ記録装置。

US8190610B2
CLAIM 36
. The map-reduce method of claim 35 , wherein all of the key-value pairs of the first and second intermediate data (ファイル) set are provided to all of the reducers .
JP2006041764A
CLAIM 7
前記制御手段は、画像機器へ新規ジョブ投入停止の要求を出した後、ログファイル (second intermediate data) をバックアップ (first set, second set, data set, first data set, second data set, second intermediate data set, output data set, intermediate data set) するように制御することを特徴とする請求項3記載のログ記録装置。

US8190610B2
CLAIM 40
. A computer system including a plurality of computing devices , the computer system configured to process data of a data set (アップ) , wherein the data set comprises a plurality of data groups , the computer system comprises at least one processor and memory that are operable to perform the following operations : for a first data set having a plurality of first key-value pairs , wherein such first data set belongs to a first data group and the first key-value pairs have a first schema , partitioning the first data set into a plurality of data partitions and providing each data partition to a selected one of a plurality of mapping functions that are each user-configurable to independently output a plurality of lists of values for each of a set of keys found in such map function' ;

s corresponding data partition to form a first intermediate data set having a first set (アップ) of resulting key-value pairs ;

for a second data set having a plurality of second key-value pairs , wherein such second data set belongs to a second data group and the second key-value pairs have a second schema , partitioning the first data set into a plurality of data partitions and providing each data partition to a selected one of a plurality of mapping functions to form a second intermediate data (ファイル) set having a second set (アップ) of resulting key-value pairs differing from the first set of resulting key-value pairs , wherein the first schema differs from the second schema ;

and reducing the first and the second intermediate data set together so as to form an output data set , wherein the reducing is accomplished by iterating on at least one key from the first and second intermediate data set , and the output data set has a different schema than the first and second schema , wherein the first and second schema and the first and second set of resulting key-value pairs have a key in common .
JP2006041764A
CLAIM 7
前記制御手段は、画像機器へ新規ジョブ投入停止の要求を出した後、ログファイル (second intermediate data) をバックアップ (first set, second set, data set, first data set, second data set, second intermediate data set, output data set, intermediate data set) するように制御することを特徴とする請求項3記載のログ記録装置。

US8190610B2
CLAIM 41
. The computer system of claim 40 , wherein the reducing is accomplished by iterating on at least one key from each of the first and second intermediate data (ファイル) set so that the output data set (アップ) is a merging of a portion of the first and second intermediate data set .
JP2006041764A
CLAIM 7
前記制御手段は、画像機器へ新規ジョブ投入停止の要求を出した後、ログファイル (second intermediate data) をバックアップ (first set, second set, data set, first data set, second data set, second intermediate data set, output data set, intermediate data set) するように制御することを特徴とする請求項3記載のログ記録装置。

US8190610B2
CLAIM 42
. The computer system of claim 40 , wherein the reducing is accomplished by a plurality of reducers , the d the at least one processor and memory further operable for : partitioning the first or second intermediate data (ファイル) set into a plurality of partitions , at least some of the key-value pairs of the first or second intermediate data being provided to more than one of the partitions ;

and providing the partitioned intermediate data set (アップ) of each partition to a separate one of the reducers .
JP2006041764A
CLAIM 7
前記制御手段は、画像機器へ新規ジョブ投入停止の要求を出した後、ログファイル (second intermediate data) をバックアップ (first set, second set, data set, first data set, second data set, second intermediate data set, output data set, intermediate data set) するように制御することを特徴とする請求項3記載のログ記録装置。

US8190610B2
CLAIM 43
. The computer system of claim 42 , wherein all of the key-value pairs of the first and second intermediate data (ファイル) set are provided to all of the reducers .
JP2006041764A
CLAIM 7
前記制御手段は、画像機器へ新規ジョブ投入停止の要求を出した後、ログファイル (second intermediate data) をバックアップ (first set, second set, data set, first data set, second data set, second intermediate data set, output data set, intermediate data set) するように制御することを特徴とする請求項3記載のログ記録装置。




US8190610B2

Filed: 2006-10-05     Issued: 2012-05-29

MapReduce for distributed database processing

(Original Assignee) Yahoo Inc     (Current Assignee) R2 Solutions LLC ; Altaba Inc

Ali Dasdan, Hung-Chih Yang, Ruey-Lung Hsiao
US20050050030A1

Filed: 2004-07-20     Issued: 2005-03-03

Set definition language for relational data

(Original Assignee) Decode Genetics EHF     (Current Assignee) Decode Genetics EHF

Hakon Gudbjartsson, Thorvaldur Arnarson, Pavol Rovensky, Vilmundur Palmason
US8190610B2
CLAIM 1
. A method of processing data of a data set (data set) over a distributed system , wherein the data set comprises a plurality of data groups , the method comprising : partitioning the data of each one of the data groups into a plurality of data partitions that each have a plurality of key-value pairs and providing each data partition to a selected one of a plurality of mapping functions that are each user-configurable to independently output a plurality of lists of values for each of a set of keys found in such map function' ;

s corresponding data partition to form corresponding intermediate data for that data group and identifiable to that data group , wherein the data of a first data group has a different schema than the data of a second data group and the data of the first data group is mapped differently than the data of the second data group so that different lists of values are output for the corresponding different intermediate data , wherein the different schema and corresponding different intermediate data have a key in common ;

and reducing the intermediate data for the data groups to at least one output data group , including processing the intermediate data for each data group in a manner that is defined to correspond to that data group , so as to result in a merging of the corresponding different intermediate data based on the key in common , wherein the mapping and reducing operations are performed by a distributed system .
US20050050030A1
CLAIM 1
. In a computer system , a method of defining sets of data to be retrieved from a data store , comprising the steps of : providing a written representation of a desired data set (data set) in terms of dimensions and relation instances , the desired data set having a certain set type ;
implying constraints on relation instances or dimensions based on the set type of the desired data set and dimension expressions , and using the written representation to query the data store and retrieve the desired data set , including enforcing expressions that have predicates on multiple attributes per conjunct in a non-ambiguous way using automatic record-locking such that the predicates on attributes from a same relation are automatically enforced on a same record .

US8190610B2
CLAIM 17
. A computer system (computer system) including a plurality of computing devices , the computer system configured to process data of a data set (data set) , wherein the data set comprises a plurality of data groups , the computer system comprises at least one processor and memory that are operable to perform the following operations : partitioning the data of each one of the data groups into a plurality of data partitions that each have a plurality of key-value pairs and providing each data partition to a selected one of a plurality of mapping functions that are each user-configurable to independently output a plurality of lists of values for each of a set of keys found in such map function' ;

s corresponding data partition to form corresponding intermediate data for that data group and identifiable to that data group , wherein the data of a first data group has a different schema than the data of a second data group and the data of the first data group is mapped differently than the data of the second data group so that different lists of values are output for the corresponding different intermediate data , wherein the different schema and corresponding different intermediate data have a key in common ;

and reduce the intermediate data for the data groups to at least one output data group , including processing the intermediate data for each data group in a manner that is defined to correspond to that data group so as to result in a merging of the corresponding different intermediate data based on the key in common .
US20050050030A1
CLAIM 1
. In a computer system (computer system) , a method of defining sets of data to be retrieved from a data store , comprising the steps of : providing a written representation of a desired data set (data set) in terms of dimensions and relation instances , the desired data set having a certain set type ;
implying constraints on relation instances or dimensions based on the set type of the desired data set and dimension expressions , and using the written representation to query the data store and retrieve the desired data set , including enforcing expressions that have predicates on multiple attributes per conjunct in a non-ambiguous way using automatic record-locking such that the predicates on attributes from a same relation are automatically enforced on a same record .

US8190610B2
CLAIM 18
. The computer system (computer system) of claim 17 , wherein : the at least one output data group is a plurality of output data groups .
US20050050030A1
CLAIM 1
. In a computer system (computer system) , a method of defining sets of data to be retrieved from a data store , comprising the steps of : providing a written representation of a desired data set in terms of dimensions and relation instances , the desired data set having a certain set type ;
implying constraints on relation instances or dimensions based on the set type of the desired data set and dimension expressions , and using the written representation to query the data store and retrieve the desired data set , including enforcing expressions that have predicates on multiple attributes per conjunct in a non-ambiguous way using automatic record-locking such that the predicates on attributes from a same relation are automatically enforced on a same record .

US8190610B2
CLAIM 19
. The computer system (computer system) of claim 17 , wherein : corresponding intermediate data for a data group being identifiable to the data group includes the corresponding intermediate data being stored such that metadata associated with the manner of storage is identifiable to the data group .
US20050050030A1
CLAIM 1
. In a computer system (computer system) , a method of defining sets of data to be retrieved from a data store , comprising the steps of : providing a written representation of a desired data set in terms of dimensions and relation instances , the desired data set having a certain set type ;
implying constraints on relation instances or dimensions based on the set type of the desired data set and dimension expressions , and using the written representation to query the data store and retrieve the desired data set , including enforcing expressions that have predicates on multiple attributes per conjunct in a non-ambiguous way using automatic record-locking such that the predicates on attributes from a same relation are automatically enforced on a same record .

US8190610B2
CLAIM 20
. The computer system (computer system) of claim 19 , wherein : corresponding intermediate data for a data group being identifiable to the data group includes the computer system being configured to store the corresponding intermediate data in association with data that is identifiable to the data group .
US20050050030A1
CLAIM 1
. In a computer system (computer system) , a method of defining sets of data to be retrieved from a data store , comprising the steps of : providing a written representation of a desired data set in terms of dimensions and relation instances , the desired data set having a certain set type ;
implying constraints on relation instances or dimensions based on the set type of the desired data set and dimension expressions , and using the written representation to query the data store and retrieve the desired data set , including enforcing expressions that have predicates on multiple attributes per conjunct in a non-ambiguous way using automatic record-locking such that the predicates on attributes from a same relation are automatically enforced on a same record .

US8190610B2
CLAIM 21
. The computer system (computer system) of claim 17 , wherein : the at least one processor and memory being configured to process the intermediate data for each data group in a manner that is defined to correspond to that data group includes , for each data group , the at least one processor and memory being configured to employ an iterator that corresponds to that data group .
US20050050030A1
CLAIM 1
. In a computer system (computer system) , a method of defining sets of data to be retrieved from a data store , comprising the steps of : providing a written representation of a desired data set in terms of dimensions and relation instances , the desired data set having a certain set type ;
implying constraints on relation instances or dimensions based on the set type of the desired data set and dimension expressions , and using the written representation to query the data store and retrieve the desired data set , including enforcing expressions that have predicates on multiple attributes per conjunct in a non-ambiguous way using automatic record-locking such that the predicates on attributes from a same relation are automatically enforced on a same record .

US8190610B2
CLAIM 22
. The computer system (computer system) of claim 21 , wherein : the intermediate data includes a plurality of grouped sets of key/value pairs ;

the at least one processor and memory are further operable to reduce the intermediate data via a plurality of reducers ;

for at least one of the reducers , the iterator corresponding to a particular data group , for that reducer , is configured to operate according to a different key of a different schema than the iterator corresponding to another particular data group , for that reducer .
US20050050030A1
CLAIM 1
. In a computer system (computer system) , a method of defining sets of data to be retrieved from a data store , comprising the steps of : providing a written representation of a desired data set in terms of dimensions and relation instances , the desired data set having a certain set type ;
implying constraints on relation instances or dimensions based on the set type of the desired data set and dimension expressions , and using the written representation to query the data store and retrieve the desired data set , including enforcing expressions that have predicates on multiple attributes per conjunct in a non-ambiguous way using automatic record-locking such that the predicates on attributes from a same relation are automatically enforced on a same record .

US8190610B2
CLAIM 23
. The computer system (computer system) of claim 17 , wherein : the intermediate data includes a plurality of grouped sets of key/value pairs ;

the at least one processor and memory are further operable to reduce the intermediate data via a plurality of reducers ;

and the at least one processor and memory are further operable to : partition the intermediate data into a plurality of partitions , to provide each key/value pair of the intermediate data to a separate one of the partitions ;

and provide the intermediate data of each partition to a separate one of the reducers .
US20050050030A1
CLAIM 1
. In a computer system (computer system) , a method of defining sets of data to be retrieved from a data store , comprising the steps of : providing a written representation of a desired data set in terms of dimensions and relation instances , the desired data set having a certain set type ;
implying constraints on relation instances or dimensions based on the set type of the desired data set and dimension expressions , and using the written representation to query the data store and retrieve the desired data set , including enforcing expressions that have predicates on multiple attributes per conjunct in a non-ambiguous way using automatic record-locking such that the predicates on attributes from a same relation are automatically enforced on a same record .

US8190610B2
CLAIM 24
. The computer system (computer system) of claim 17 , wherein : the intermediate data includes a plurality of grouped sets of key/value pairs ;

the at least one processor and memory are further operable to reduce the intermediate data via a plurality of reducers ;

and the at least one processor and memory are further operable to : partition the intermediate data into a plurality of partitions , to provide at least some of the key/value pairs of the intermediate data to more than one of the partitions ;

and provide the intermediate data of each partition to a separate one of the reducers .
US20050050030A1
CLAIM 1
. In a computer system (computer system) , a method of defining sets of data to be retrieved from a data store , comprising the steps of : providing a written representation of a desired data set in terms of dimensions and relation instances , the desired data set having a certain set type ;
implying constraints on relation instances or dimensions based on the set type of the desired data set and dimension expressions , and using the written representation to query the data store and retrieve the desired data set , including enforcing expressions that have predicates on multiple attributes per conjunct in a non-ambiguous way using automatic record-locking such that the predicates on attributes from a same relation are automatically enforced on a same record .

US8190610B2
CLAIM 25
. The computer system (computer system) of claim 24 , wherein : the at least one processor and memory are further operable to , in partitioning , to provide all of the key/value pairs of the intermediate data to all of the partitions .
US20050050030A1
CLAIM 1
. In a computer system (computer system) , a method of defining sets of data to be retrieved from a data store , comprising the steps of : providing a written representation of a desired data set in terms of dimensions and relation instances , the desired data set having a certain set type ;
implying constraints on relation instances or dimensions based on the set type of the desired data set and dimension expressions , and using the written representation to query the data store and retrieve the desired data set , including enforcing expressions that have predicates on multiple attributes per conjunct in a non-ambiguous way using automatic record-locking such that the predicates on attributes from a same relation are automatically enforced on a same record .

US8190610B2
CLAIM 26
. The computer system (computer system) of claim 24 , wherein : at least some of the reducers include a sort , group-by-key and combine task ;

the at least one processor and memory are further operable to generate and provide metadata for at least some of the mapping , partitioning , combining , grouping and sorting .
US20050050030A1
CLAIM 1
. In a computer system (computer system) , a method of defining sets of data to be retrieved from a data store , comprising the steps of : providing a written representation of a desired data set in terms of dimensions and relation instances , the desired data set having a certain set type ;
implying constraints on relation instances or dimensions based on the set type of the desired data set and dimension expressions , and using the written representation to query the data store and retrieve the desired data set , including enforcing expressions that have predicates on multiple attributes per conjunct in a non-ambiguous way using automatic record-locking such that the predicates on attributes from a same relation are automatically enforced on a same record .

US8190610B2
CLAIM 27
. The computer system (computer system) of claim 26 , wherein : the reducing includes processing the metadata .
US20050050030A1
CLAIM 1
. In a computer system (computer system) , a method of defining sets of data to be retrieved from a data store , comprising the steps of : providing a written representation of a desired data set in terms of dimensions and relation instances , the desired data set having a certain set type ;
implying constraints on relation instances or dimensions based on the set type of the desired data set and dimension expressions , and using the written representation to query the data store and retrieve the desired data set , including enforcing expressions that have predicates on multiple attributes per conjunct in a non-ambiguous way using automatic record-locking such that the predicates on attributes from a same relation are automatically enforced on a same record .

US8190610B2
CLAIM 28
. The computer system (computer system) of claim 27 , wherein : the at least one processor and memory being further operable to process the intermediate data for each data group in a manner that is defined to correspond to that data group includes , for each data group , the at least one processor and memory being further operable to employ an iterator that corresponds to that data group , wherein the iterator is configured to provide the associated metadata to the reducing .
US20050050030A1
CLAIM 1
. In a computer system (computer system) , a method of defining sets of data to be retrieved from a data store , comprising the steps of : providing a written representation of a desired data set in terms of dimensions and relation instances , the desired data set having a certain set type ;
implying constraints on relation instances or dimensions based on the set type of the desired data set and dimension expressions , and using the written representation to query the data store and retrieve the desired data set , including enforcing expressions that have predicates on multiple attributes per conjunct in a non-ambiguous way using automatic record-locking such that the predicates on attributes from a same relation are automatically enforced on a same record .

US8190610B2
CLAIM 29
. The computer system (computer system) of claim 17 , wherein : the intermediate data processing of the reducing further comprises processing data that is not intermediate data .
US20050050030A1
CLAIM 1
. In a computer system (computer system) , a method of defining sets of data to be retrieved from a data store , comprising the steps of : providing a written representation of a desired data set in terms of dimensions and relation instances , the desired data set having a certain set type ;
implying constraints on relation instances or dimensions based on the set type of the desired data set and dimension expressions , and using the written representation to query the data store and retrieve the desired data set , including enforcing expressions that have predicates on multiple attributes per conjunct in a non-ambiguous way using automatic record-locking such that the predicates on attributes from a same relation are automatically enforced on a same record .

US8190610B2
CLAIM 30
. The computer system (computer system) of claim 29 , wherein : the at least one processor and memory are further operable to reduce the intermediate data via a plurality of reducers ;

and the data that is not intermediate data , for a particular reducer , includes data that is associated with another reducer .
US20050050030A1
CLAIM 1
. In a computer system (computer system) , a method of defining sets of data to be retrieved from a data store , comprising the steps of : providing a written representation of a desired data set in terms of dimensions and relation instances , the desired data set having a certain set type ;
implying constraints on relation instances or dimensions based on the set type of the desired data set and dimension expressions , and using the written representation to query the data store and retrieve the desired data set , including enforcing expressions that have predicates on multiple attributes per conjunct in a non-ambiguous way using automatic record-locking such that the predicates on attributes from a same relation are automatically enforced on a same record .

US8190610B2
CLAIM 31
. The computer system (computer system) of claim 29 , wherein : the at least one processor and memory are further operable to reduce the intermediate data via a plurality of reducers ;

and the data that is not intermediate data , for a particular reducer , includes data that is associated with that reducer .
US20050050030A1
CLAIM 1
. In a computer system (computer system) , a method of defining sets of data to be retrieved from a data store , comprising the steps of : providing a written representation of a desired data set in terms of dimensions and relation instances , the desired data set having a certain set type ;
implying constraints on relation instances or dimensions based on the set type of the desired data set and dimension expressions , and using the written representation to query the data store and retrieve the desired data set , including enforcing expressions that have predicates on multiple attributes per conjunct in a non-ambiguous way using automatic record-locking such that the predicates on attributes from a same relation are automatically enforced on a same record .

US8190610B2
CLAIM 32
. The computer system (computer system) of claim 17 , wherein the at least one processor and memory are further operable to reduce by relating the data among the plurality of data groups .
US20050050030A1
CLAIM 1
. In a computer system (computer system) , a method of defining sets of data to be retrieved from a data store , comprising the steps of : providing a written representation of a desired data set in terms of dimensions and relation instances , the desired data set having a certain set type ;
implying constraints on relation instances or dimensions based on the set type of the desired data set and dimension expressions , and using the written representation to query the data store and retrieve the desired data set , including enforcing expressions that have predicates on multiple attributes per conjunct in a non-ambiguous way using automatic record-locking such that the predicates on attributes from a same relation are automatically enforced on a same record .

US8190610B2
CLAIM 33
. A map-reduce method of processing data from a plurality of groups having different schema over a computer system (computer system) , the method comprising : for a first data set (data set) having a plurality of first key-value pairs , wherein such first data set belongs to a first data group and the first key-value pairs have a first schema , partitioning the first data set into a plurality of data partitions and providing each data partition to a selected one of a plurality of mapping functions that are each user-configurable to independently output a plurality of lists of values for each of a set of keys found in such map function' ;

s corresponding data partition to form a first intermediate data set having a first set of resulting key-value pairs ;

for a second data set having a plurality of second key-value pairs , wherein such second data set belongs to a second data group and the second key-value pairs have a second schema , partitioning the first data set into a plurality of data partitions and providing each data partition to a selected one of a plurality of mapping functions to form a second intermediate data set having a second set (including one) of resulting key-value pairs differing from the first set of resulting key-value pairs , wherein the first schema differs from the second schema , wherein the first and second schema and the first and second set of resulting key-value pairs have a key in common ;

and reducing the first and the second intermediate data set together so as to form an output data set , wherein the reducing is accomplished by iterating on at least one key from the first and second intermediate data set , and the output data set has a different schema than the first and second schema , wherein the mapping and reducing operations are performed by a distributed system .
US20050050030A1
CLAIM 1
. In a computer system (computer system) , a method of defining sets of data to be retrieved from a data store , comprising the steps of : providing a written representation of a desired data set (data set) in terms of dimensions and relation instances , the desired data set having a certain set type ;
implying constraints on relation instances or dimensions based on the set type of the desired data set and dimension expressions , and using the written representation to query the data store and retrieve the desired data set , including enforcing expressions that have predicates on multiple attributes per conjunct in a non-ambiguous way using automatic record-locking such that the predicates on attributes from a same relation are automatically enforced on a same record .

US20050050030A1
CLAIM 4
. A method as claimed in claim 1 wherein the step of providing a written representation includes using an expression with extended virtual relations , said extended virtual relations including one (second set) of (i) predicates on dimensions and (ii) a WHERE clause within a record operator .

US8190610B2
CLAIM 34
. The map-reduce method of claim 33 , wherein the reducing is accomplished by iterating on at least one key from each of the first and second intermediate data set (data set) so that the output data set is a merging of a portion of the first and second intermediate data set .
US20050050030A1
CLAIM 1
. In a computer system , a method of defining sets of data to be retrieved from a data store , comprising the steps of : providing a written representation of a desired data set (data set) in terms of dimensions and relation instances , the desired data set having a certain set type ;
implying constraints on relation instances or dimensions based on the set type of the desired data set and dimension expressions , and using the written representation to query the data store and retrieve the desired data set , including enforcing expressions that have predicates on multiple attributes per conjunct in a non-ambiguous way using automatic record-locking such that the predicates on attributes from a same relation are automatically enforced on a same record .

US8190610B2
CLAIM 35
. The map-reduce method of claim 33 , wherein the reducing is accomplished by a plurality of reducers , the method further comprises : partitioning the first or second intermediate data set (data set) into a plurality of partitions , at least some of the key-value pairs of the first or second intermediate data being provided to more than one of the partitions ;

and providing the partitioned intermediate data set of each partition to a separate one of the reducers .
US20050050030A1
CLAIM 1
. In a computer system , a method of defining sets of data to be retrieved from a data store , comprising the steps of : providing a written representation of a desired data set (data set) in terms of dimensions and relation instances , the desired data set having a certain set type ;
implying constraints on relation instances or dimensions based on the set type of the desired data set and dimension expressions , and using the written representation to query the data store and retrieve the desired data set , including enforcing expressions that have predicates on multiple attributes per conjunct in a non-ambiguous way using automatic record-locking such that the predicates on attributes from a same relation are automatically enforced on a same record .

US8190610B2
CLAIM 36
. The map-reduce method of claim 35 , wherein all of the key-value pairs of the first and second intermediate data set (data set) are provided to all of the reducers .
US20050050030A1
CLAIM 1
. In a computer system , a method of defining sets of data to be retrieved from a data store , comprising the steps of : providing a written representation of a desired data set (data set) in terms of dimensions and relation instances , the desired data set having a certain set type ;
implying constraints on relation instances or dimensions based on the set type of the desired data set and dimension expressions , and using the written representation to query the data store and retrieve the desired data set , including enforcing expressions that have predicates on multiple attributes per conjunct in a non-ambiguous way using automatic record-locking such that the predicates on attributes from a same relation are automatically enforced on a same record .

US8190610B2
CLAIM 40
. A computer system (computer system) including a plurality of computing devices , the computer system configured to process data of a data set (data set) , wherein the data set comprises a plurality of data groups , the computer system comprises at least one processor and memory that are operable to perform the following operations : for a first data set having a plurality of first key-value pairs , wherein such first data set belongs to a first data group and the first key-value pairs have a first schema , partitioning the first data set into a plurality of data partitions and providing each data partition to a selected one of a plurality of mapping functions that are each user-configurable to independently output a plurality of lists of values for each of a set of keys found in such map function' ;

s corresponding data partition to form a first intermediate data set having a first set of resulting key-value pairs ;

for a second data set having a plurality of second key-value pairs , wherein such second data set belongs to a second data group and the second key-value pairs have a second schema , partitioning the first data set into a plurality of data partitions and providing each data partition to a selected one of a plurality of mapping functions to form a second intermediate data set having a second set (including one) of resulting key-value pairs differing from the first set of resulting key-value pairs , wherein the first schema differs from the second schema ;

and reducing the first and the second intermediate data set together so as to form an output data set , wherein the reducing is accomplished by iterating on at least one key from the first and second intermediate data set , and the output data set has a different schema than the first and second schema , wherein the first and second schema and the first and second set of resulting key-value pairs have a key in common .
US20050050030A1
CLAIM 1
. In a computer system (computer system) , a method of defining sets of data to be retrieved from a data store , comprising the steps of : providing a written representation of a desired data set (data set) in terms of dimensions and relation instances , the desired data set having a certain set type ;
implying constraints on relation instances or dimensions based on the set type of the desired data set and dimension expressions , and using the written representation to query the data store and retrieve the desired data set , including enforcing expressions that have predicates on multiple attributes per conjunct in a non-ambiguous way using automatic record-locking such that the predicates on attributes from a same relation are automatically enforced on a same record .

US20050050030A1
CLAIM 4
. A method as claimed in claim 1 wherein the step of providing a written representation includes using an expression with extended virtual relations , said extended virtual relations including one (second set) of (i) predicates on dimensions and (ii) a WHERE clause within a record operator .

US8190610B2
CLAIM 41
. The computer system (computer system) of claim 40 , wherein the reducing is accomplished by iterating on at least one key from each of the first and second intermediate data set (data set) so that the output data set is a merging of a portion of the first and second intermediate data set .
US20050050030A1
CLAIM 1
. In a computer system (computer system) , a method of defining sets of data to be retrieved from a data store , comprising the steps of : providing a written representation of a desired data set (data set) in terms of dimensions and relation instances , the desired data set having a certain set type ;
implying constraints on relation instances or dimensions based on the set type of the desired data set and dimension expressions , and using the written representation to query the data store and retrieve the desired data set , including enforcing expressions that have predicates on multiple attributes per conjunct in a non-ambiguous way using automatic record-locking such that the predicates on attributes from a same relation are automatically enforced on a same record .

US8190610B2
CLAIM 42
. The computer system (computer system) of claim 40 , wherein the reducing is accomplished by a plurality of reducers , the d the at least one processor and memory further operable for : partitioning the first or second intermediate data set (data set) into a plurality of partitions , at least some of the key-value pairs of the first or second intermediate data being provided to more than one of the partitions ;

and providing the partitioned intermediate data set of each partition to a separate one of the reducers .
US20050050030A1
CLAIM 1
. In a computer system (computer system) , a method of defining sets of data to be retrieved from a data store , comprising the steps of : providing a written representation of a desired data set (data set) in terms of dimensions and relation instances , the desired data set having a certain set type ;
implying constraints on relation instances or dimensions based on the set type of the desired data set and dimension expressions , and using the written representation to query the data store and retrieve the desired data set , including enforcing expressions that have predicates on multiple attributes per conjunct in a non-ambiguous way using automatic record-locking such that the predicates on attributes from a same relation are automatically enforced on a same record .

US8190610B2
CLAIM 43
. The computer system (computer system) of claim 42 , wherein all of the key-value pairs of the first and second intermediate data set (data set) are provided to all of the reducers .
US20050050030A1
CLAIM 1
. In a computer system (computer system) , a method of defining sets of data to be retrieved from a data store , comprising the steps of : providing a written representation of a desired data set (data set) in terms of dimensions and relation instances , the desired data set having a certain set type ;
implying constraints on relation instances or dimensions based on the set type of the desired data set and dimension expressions , and using the written representation to query the data store and retrieve the desired data set , including enforcing expressions that have predicates on multiple attributes per conjunct in a non-ambiguous way using automatic record-locking such that the predicates on attributes from a same relation are automatically enforced on a same record .

US8190610B2
CLAIM 44
. The computer system (computer system) of claim 42 , wherein at least some of the reducers include a sort , group-by-key and combine task , the d the at least one processor and memory are further operable for : generating and providing metadata for at least some of the mapping , partitioning , combining , grouping and sorting .
US20050050030A1
CLAIM 1
. In a computer system (computer system) , a method of defining sets of data to be retrieved from a data store , comprising the steps of : providing a written representation of a desired data set in terms of dimensions and relation instances , the desired data set having a certain set type ;
implying constraints on relation instances or dimensions based on the set type of the desired data set and dimension expressions , and using the written representation to query the data store and retrieve the desired data set , including enforcing expressions that have predicates on multiple attributes per conjunct in a non-ambiguous way using automatic record-locking such that the predicates on attributes from a same relation are automatically enforced on a same record .

US8190610B2
CLAIM 45
. The computer system (computer system) of claim 44 , wherein the reducing includes processing the metadata .
US20050050030A1
CLAIM 1
. In a computer system (computer system) , a method of defining sets of data to be retrieved from a data store , comprising the steps of : providing a written representation of a desired data set in terms of dimensions and relation instances , the desired data set having a certain set type ;
implying constraints on relation instances or dimensions based on the set type of the desired data set and dimension expressions , and using the written representation to query the data store and retrieve the desired data set , including enforcing expressions that have predicates on multiple attributes per conjunct in a non-ambiguous way using automatic record-locking such that the predicates on attributes from a same relation are automatically enforced on a same record .

US8190610B2
CLAIM 46
. The computer system (computer system) of claim 45 , wherein iterating includes providing the associated metadata to the processing of the reducing step .
US20050050030A1
CLAIM 1
. In a computer system (computer system) , a method of defining sets of data to be retrieved from a data store , comprising the steps of : providing a written representation of a desired data set in terms of dimensions and relation instances , the desired data set having a certain set type ;
implying constraints on relation instances or dimensions based on the set type of the desired data set and dimension expressions , and using the written representation to query the data store and retrieve the desired data set , including enforcing expressions that have predicates on multiple attributes per conjunct in a non-ambiguous way using automatic record-locking such that the predicates on attributes from a same relation are automatically enforced on a same record .




US8190610B2

Filed: 2006-10-05     Issued: 2012-05-29

MapReduce for distributed database processing

(Original Assignee) Yahoo Inc     (Current Assignee) R2 Solutions LLC ; Altaba Inc

Ali Dasdan, Hung-Chih Yang, Ruey-Lung Hsiao
US20060190195A1

Filed: 2004-07-16     Issued: 2006-08-24

Clinical examination analyzing device, clinical examination analyzing method, and program for allowing computer to execute the method

(Original Assignee) Kochi University NUC; A&T Corp     (Current Assignee) Kochi University NUC ; A&T Corp

Tatsuhisa Watanabe, Hiromi Kataoka, Akira Horimoto
US8190610B2
CLAIM 1
. A method of processing data of a data set over a distributed system , wherein the data set comprises a plurality of data groups , the method comprising : partitioning the data of each one of the data groups into a plurality of data partitions that each have a plurality of key-value pairs and providing each data partition to a selected one of a plurality of mapping functions that are each user-configurable to independently output a plurality of lists of values for each of a set of keys found in such map function' ;

s corresponding data partition (calculating unit) to form corresponding intermediate data (calculating unit) for that data group and identifiable to that data group , wherein the data of a first data group has a different schema than the data of a second data group and the data of the first data group is mapped differently than the data of the second data group so that different lists of values are output for the corresponding different intermediate data , wherein the different schema and corresponding different intermediate data have a key in common ;

and reducing the intermediate data for the data groups to at least one output data group , including processing the intermediate data for each data group in a manner that is defined to correspond to that data group , so as to result in a merging of the corresponding different intermediate data based on the key in common , wherein the mapping and reducing operations are performed by a distributed system .
US20060190195A1
CLAIM 18
. A device for confirming a validity of result of clinical examination of a part of a subject for a clinical examination , the device comprising : a receiving unit configured to receive present data that is clinical data of the part obtained this time and previous data that is clinical data of the part obtained last time ;
a storing unit configured to store a plurality of reference patterns , the reference patterns being classified into a plurality of levels ;
a selecting unit configured to select , from the reference patterns , a first reference pattern best matching with the present data and a second reference pattern best matching with the previous data ;
a calculating unit (s corresponding data partition to form corresponding intermediate data, includes data, s corresponding data partition) configured to calculate a value indicative of a distance between a position of the first reference pattern and a position of the second reference pattern ;
and a determining unit configured to determine a validity of the present data based on the value .

US8190610B2
CLAIM 12
. The method of claim 5 , wherein : the intermediate data includes a plurality of grouped sets of key/value pairs ;

the reducing step is carried out by a plurality of reducers ;

for at least one of the reducers , the iterator corresponding to a particular data group , for that reducer , operates according to a different key (different kind) of a different schema than the iterator corresponding to another particular data group , for that reducer .
US20060190195A1
CLAIM 24
. The device according to claim 18 , wherein the storing unit includes a database of reference patterns each having reference patters for a different kind (different key) of clinical examination , and a group of reference patterns corresponding to a desired kind of clinical examination is retrieved from the database to be used .

US8190610B2
CLAIM 14
. The method of claim 13 , wherein : the reducing step is carried out by a plurality of reducers ;

and the data that is not intermediate data , for a particular reducer , includes data (calculating unit) that is associated with another reducer .
US20060190195A1
CLAIM 18
. A device for confirming a validity of result of clinical examination of a part of a subject for a clinical examination , the device comprising : a receiving unit configured to receive present data that is clinical data of the part obtained this time and previous data that is clinical data of the part obtained last time ;
a storing unit configured to store a plurality of reference patterns , the reference patterns being classified into a plurality of levels ;
a selecting unit configured to select , from the reference patterns , a first reference pattern best matching with the present data and a second reference pattern best matching with the previous data ;
a calculating unit (s corresponding data partition to form corresponding intermediate data, includes data, s corresponding data partition) configured to calculate a value indicative of a distance between a position of the first reference pattern and a position of the second reference pattern ;
and a determining unit configured to determine a validity of the present data based on the value .

US8190610B2
CLAIM 15
. The method of claim 13 , wherein : the reducing step is carried out by a plurality of reducers ;

and the data that is not intermediate data , for a particular reducer , includes data (calculating unit) that is associated with that reducer .
US20060190195A1
CLAIM 18
. A device for confirming a validity of result of clinical examination of a part of a subject for a clinical examination , the device comprising : a receiving unit configured to receive present data that is clinical data of the part obtained this time and previous data that is clinical data of the part obtained last time ;
a storing unit configured to store a plurality of reference patterns , the reference patterns being classified into a plurality of levels ;
a selecting unit configured to select , from the reference patterns , a first reference pattern best matching with the present data and a second reference pattern best matching with the previous data ;
a calculating unit (s corresponding data partition to form corresponding intermediate data, includes data, s corresponding data partition) configured to calculate a value indicative of a distance between a position of the first reference pattern and a position of the second reference pattern ;
and a determining unit configured to determine a validity of the present data based on the value .

US8190610B2
CLAIM 17
. A computer system including a plurality of computing devices (waveform data) , the computer system configured to process data of a data set , wherein the data set comprises a plurality of data groups , the computer system comprises at least one processor and memory that are operable to perform the following operations : partitioning the data of each one of the data groups into a plurality of data partitions that each have a plurality of key-value pairs and providing each data partition to a selected one of a plurality of mapping functions that are each user-configurable to independently output a plurality of lists of values for each of a set of keys found in such map function' ;

s corresponding data partition (calculating unit) to form corresponding intermediate data (calculating unit) for that data group and identifiable to that data group , wherein the data of a first data group has a different schema than the data of a second data group and the data of the first data group is mapped differently than the data of the second data group so that different lists of values are output for the corresponding different intermediate data , wherein the different schema and corresponding different intermediate data have a key in common ;

and reduce the intermediate data for the data groups to at least one output data group , including processing the intermediate data for each data group in a manner that is defined to correspond to that data group so as to result in a merging of the corresponding different intermediate data based on the key in common .
US20060190195A1
CLAIM 18
. A device for confirming a validity of result of clinical examination of a part of a subject for a clinical examination , the device comprising : a receiving unit configured to receive present data that is clinical data of the part obtained this time and previous data that is clinical data of the part obtained last time ;
a storing unit configured to store a plurality of reference patterns , the reference patterns being classified into a plurality of levels ;
a selecting unit configured to select , from the reference patterns , a first reference pattern best matching with the present data and a second reference pattern best matching with the previous data ;
a calculating unit (s corresponding data partition to form corresponding intermediate data, includes data, s corresponding data partition) configured to calculate a value indicative of a distance between a position of the first reference pattern and a position of the second reference pattern ;
and a determining unit configured to determine a validity of the present data based on the value .

US20060190195A1
CLAIM 21
. The device according to claim 18 , wherein data in each of the present data , the previous data , and the reference patterns includes waveform data (computing devices) .

US8190610B2
CLAIM 22
. The computer system of claim 21 , wherein : the intermediate data includes a plurality of grouped sets of key/value pairs ;

the at least one processor and memory are further operable to reduce the intermediate data via a plurality of reducers ;

for at least one of the reducers , the iterator corresponding to a particular data group , for that reducer , is configured to operate according to a different key (different kind) of a different schema than the iterator corresponding to another particular data group , for that reducer .
US20060190195A1
CLAIM 24
. The device according to claim 18 , wherein the storing unit includes a database of reference patterns each having reference patters for a different kind (different key) of clinical examination , and a group of reference patterns corresponding to a desired kind of clinical examination is retrieved from the database to be used .

US8190610B2
CLAIM 30
. The computer system of claim 29 , wherein : the at least one processor and memory are further operable to reduce the intermediate data via a plurality of reducers ;

and the data that is not intermediate data , for a particular reducer , includes data (calculating unit) that is associated with another reducer .
US20060190195A1
CLAIM 18
. A device for confirming a validity of result of clinical examination of a part of a subject for a clinical examination , the device comprising : a receiving unit configured to receive present data that is clinical data of the part obtained this time and previous data that is clinical data of the part obtained last time ;
a storing unit configured to store a plurality of reference patterns , the reference patterns being classified into a plurality of levels ;
a selecting unit configured to select , from the reference patterns , a first reference pattern best matching with the present data and a second reference pattern best matching with the previous data ;
a calculating unit (s corresponding data partition to form corresponding intermediate data, includes data, s corresponding data partition) configured to calculate a value indicative of a distance between a position of the first reference pattern and a position of the second reference pattern ;
and a determining unit configured to determine a validity of the present data based on the value .

US8190610B2
CLAIM 31
. The computer system of claim 29 , wherein : the at least one processor and memory are further operable to reduce the intermediate data via a plurality of reducers ;

and the data that is not intermediate data , for a particular reducer , includes data (calculating unit) that is associated with that reducer .
US20060190195A1
CLAIM 18
. A device for confirming a validity of result of clinical examination of a part of a subject for a clinical examination , the device comprising : a receiving unit configured to receive present data that is clinical data of the part obtained this time and previous data that is clinical data of the part obtained last time ;
a storing unit configured to store a plurality of reference patterns , the reference patterns being classified into a plurality of levels ;
a selecting unit configured to select , from the reference patterns , a first reference pattern best matching with the present data and a second reference pattern best matching with the previous data ;
a calculating unit (s corresponding data partition to form corresponding intermediate data, includes data, s corresponding data partition) configured to calculate a value indicative of a distance between a position of the first reference pattern and a position of the second reference pattern ;
and a determining unit configured to determine a validity of the present data based on the value .

US8190610B2
CLAIM 33
. A map-reduce method of processing data from a plurality of groups having different schema over a computer system , the method comprising : for a first data set having a plurality of first key-value pairs , wherein such first data set belongs to a first data group and the first key-value pairs have a first schema , partitioning the first data set into a plurality of data partitions and providing each data partition to a selected one of a plurality of mapping functions that are each user-configurable to independently output a plurality of lists of values for each of a set of keys found in such map function' ;

s corresponding data partition (calculating unit) to form a first intermediate data set having a first set of resulting key-value pairs ;

for a second data set having a plurality of second key-value pairs , wherein such second data set belongs to a second data group and the second key-value pairs have a second schema , partitioning the first data set into a plurality of data partitions and providing each data partition to a selected one of a plurality of mapping functions to form a second intermediate data set having a second set of resulting key-value pairs differing from the first set of resulting key-value pairs , wherein the first schema differs from the second schema , wherein the first and second schema and the first and second set of resulting key-value pairs have a key in common ;

and reducing the first and the second intermediate data set together so as to form an output data set , wherein the reducing is accomplished by iterating on at least one key from the first and second intermediate data set , and the output data set has a different schema than the first and second schema , wherein the mapping and reducing operations are performed by a distributed system .
US20060190195A1
CLAIM 18
. A device for confirming a validity of result of clinical examination of a part of a subject for a clinical examination , the device comprising : a receiving unit configured to receive present data that is clinical data of the part obtained this time and previous data that is clinical data of the part obtained last time ;
a storing unit configured to store a plurality of reference patterns , the reference patterns being classified into a plurality of levels ;
a selecting unit configured to select , from the reference patterns , a first reference pattern best matching with the present data and a second reference pattern best matching with the previous data ;
a calculating unit (s corresponding data partition to form corresponding intermediate data, includes data, s corresponding data partition) configured to calculate a value indicative of a distance between a position of the first reference pattern and a position of the second reference pattern ;
and a determining unit configured to determine a validity of the present data based on the value .

US8190610B2
CLAIM 40
. A computer system including a plurality of computing devices (waveform data) , the computer system configured to process data of a data set , wherein the data set comprises a plurality of data groups , the computer system comprises at least one processor and memory that are operable to perform the following operations : for a first data set having a plurality of first key-value pairs , wherein such first data set belongs to a first data group and the first key-value pairs have a first schema , partitioning the first data set into a plurality of data partitions and providing each data partition to a selected one of a plurality of mapping functions that are each user-configurable to independently output a plurality of lists of values for each of a set of keys found in such map function' ;

s corresponding data partition (calculating unit) to form a first intermediate data set having a first set of resulting key-value pairs ;

for a second data set having a plurality of second key-value pairs , wherein such second data set belongs to a second data group and the second key-value pairs have a second schema , partitioning the first data set into a plurality of data partitions and providing each data partition to a selected one of a plurality of mapping functions to form a second intermediate data set having a second set of resulting key-value pairs differing from the first set of resulting key-value pairs , wherein the first schema differs from the second schema ;

and reducing the first and the second intermediate data set together so as to form an output data set , wherein the reducing is accomplished by iterating on at least one key from the first and second intermediate data set , and the output data set has a different schema than the first and second schema , wherein the first and second schema and the first and second set of resulting key-value pairs have a key in common .
US20060190195A1
CLAIM 18
. A device for confirming a validity of result of clinical examination of a part of a subject for a clinical examination , the device comprising : a receiving unit configured to receive present data that is clinical data of the part obtained this time and previous data that is clinical data of the part obtained last time ;
a storing unit configured to store a plurality of reference patterns , the reference patterns being classified into a plurality of levels ;
a selecting unit configured to select , from the reference patterns , a first reference pattern best matching with the present data and a second reference pattern best matching with the previous data ;
a calculating unit (s corresponding data partition to form corresponding intermediate data, includes data, s corresponding data partition) configured to calculate a value indicative of a distance between a position of the first reference pattern and a position of the second reference pattern ;
and a determining unit configured to determine a validity of the present data based on the value .

US20060190195A1
CLAIM 21
. The device according to claim 18 , wherein data in each of the present data , the previous data , and the reference patterns includes waveform data (computing devices) .




US8190610B2

Filed: 2006-10-05     Issued: 2012-05-29

MapReduce for distributed database processing

(Original Assignee) Yahoo Inc     (Current Assignee) R2 Solutions LLC ; Altaba Inc

Ali Dasdan, Hung-Chih Yang, Ruey-Lung Hsiao
EP1498815A2

Filed: 2004-06-30     Issued: 2005-01-19

Methods for ensuring referential integrity in multi-threaded replication engines

(Original Assignee) Gravic Inc     (Current Assignee) Gravic Inc

Bruce D. Holenstein, Paul J. Holenstein, Wilbur H. Highleyman
US8190610B2
CLAIM 1
. A method of processing data of a data set over a distributed system , wherein the data set comprises a plurality of data groups , the method comprising : partitioning the data of each one of the data groups into a plurality of data partitions that each have a plurality of key-value pairs and providing each data partition to a selected one of a plurality of mapping functions that are each user-configurable to independently output a plurality of lists of values for each of a set of keys found in such map function' ;

s corresponding data partition to form corresponding intermediate data for that data group and identifiable to that data group , wherein the data of a first data (repeating step) group has a different schema than the data of a second data group (repeating step) and the data of the first data group is mapped differently than the data of the second data group so that different lists of values are output for the corresponding different intermediate data , wherein the different schema and corresponding different intermediate data have a key in common ;

and reducing the intermediate data for the data groups to at least one output data group , including processing the intermediate data for each data group in a manner that is defined to correspond to that data group , so as to result in a merging of the corresponding different intermediate data based on the key in common , wherein the mapping and reducing operations are performed by a distributed system .
EP1498815A2
CLAIM 49
The method of claim 48 wherein the appliers normally post transaction data to the target database only upon receipt of a commit step or operation associated with respective transaction data , the method further comprising : (d) repeating step (first data, first data group, second data group) s (b) and (c) by adding additional appliers as the current transaction load becomes close or equal to the total transaction capacity associated with all of the previously added appliers ;
(e) upon reaching a system limit wherein no more appliers can be added and the current transaction load becomes close or equal to the maximum transaction load capacity of all of the appliers , prematurely conducting a commit step or operation on at least some of the transaction data in at least some of the appliers , thereby causing the transaction data to become posted to the target database and deleted from the respective appliers .

US8190610B2
CLAIM 2
. The method of claim 1 , wherein : the at least one output data group is a plurality of output data groups (elapsed time) .
EP1498815A2
CLAIM 65
The method of claim 60 wherein the maximum transaction threshold limit is the maximum elapsed time (output data groups, second data set) span of a transaction .

US8190610B2
CLAIM 17
. A computer system including a plurality of computing devices , the computer system configured to process data of a data set , wherein the data set comprises a plurality of data groups , the computer system comprises at least one processor and memory that are operable to perform the following operations : partitioning the data of each one of the data groups into a plurality of data partitions that each have a plurality of key-value pairs and providing each data partition to a selected one of a plurality of mapping functions that are each user-configurable to independently output a plurality of lists of values for each of a set of keys found in such map function' ;

s corresponding data partition to form corresponding intermediate data for that data group and identifiable to that data group , wherein the data of a first data (repeating step) group has a different schema than the data of a second data group (repeating step) and the data of the first data group is mapped differently than the data of the second data group so that different lists of values are output for the corresponding different intermediate data , wherein the different schema and corresponding different intermediate data have a key in common ;

and reduce the intermediate data for the data groups to at least one output data group , including processing the intermediate data for each data group in a manner that is defined to correspond to that data group so as to result in a merging of the corresponding different intermediate data based on the key in common .
EP1498815A2
CLAIM 49
The method of claim 48 wherein the appliers normally post transaction data to the target database only upon receipt of a commit step or operation associated with respective transaction data , the method further comprising : (d) repeating step (first data, first data group, second data group) s (b) and (c) by adding additional appliers as the current transaction load becomes close or equal to the total transaction capacity associated with all of the previously added appliers ;
(e) upon reaching a system limit wherein no more appliers can be added and the current transaction load becomes close or equal to the maximum transaction load capacity of all of the appliers , prematurely conducting a commit step or operation on at least some of the transaction data in at least some of the appliers , thereby causing the transaction data to become posted to the target database and deleted from the respective appliers .

US8190610B2
CLAIM 18
. The computer system of claim 17 , wherein : the at least one output data group is a plurality of output data groups (elapsed time) .
EP1498815A2
CLAIM 65
The method of claim 60 wherein the maximum transaction threshold limit is the maximum elapsed time (output data groups, second data set) span of a transaction .

US8190610B2
CLAIM 33
. A map-reduce method of processing data from a plurality of groups having different schema over a computer system , the method comprising : for a first data (repeating step) set having a plurality of first key-value pairs , wherein such first data set belongs to a first data group and the first key-value pairs have a first schema , partitioning the first data set into a plurality of data partitions and providing each data partition to a selected one of a plurality of mapping functions that are each user-configurable to independently output a plurality of lists of values for each of a set of keys found in such map function' ;

s corresponding data partition to form a first intermediate data set having a first set of resulting key-value pairs ;

for a second data set (elapsed time) having a plurality of second key-value pairs , wherein such second data set belongs to a second data group (repeating step) and the second key-value pairs have a second schema , partitioning the first data set into a plurality of data partitions and providing each data partition to a selected one of a plurality of mapping functions to form a second intermediate data set having a second set of resulting key-value pairs differing from the first set of resulting key-value pairs , wherein the first schema differs from the second schema , wherein the first and second schema and the first and second set of resulting key-value pairs have a key in common ;

and reducing the first and the second intermediate data set together so as to form an output data set , wherein the reducing is accomplished by iterating on at least one key from the first and second intermediate data set , and the output data set has a different schema than the first and second schema , wherein the mapping and reducing operations are performed by a distributed system .
EP1498815A2
CLAIM 49
The method of claim 48 wherein the appliers normally post transaction data to the target database only upon receipt of a commit step or operation associated with respective transaction data , the method further comprising : (d) repeating step (first data, first data group, second data group) s (b) and (c) by adding additional appliers as the current transaction load becomes close or equal to the total transaction capacity associated with all of the previously added appliers ;
(e) upon reaching a system limit wherein no more appliers can be added and the current transaction load becomes close or equal to the maximum transaction load capacity of all of the appliers , prematurely conducting a commit step or operation on at least some of the transaction data in at least some of the appliers , thereby causing the transaction data to become posted to the target database and deleted from the respective appliers .

EP1498815A2
CLAIM 65
The method of claim 60 wherein the maximum transaction threshold limit is the maximum elapsed time (output data groups, second data set) span of a transaction .

US8190610B2
CLAIM 40
. A computer system including a plurality of computing devices , the computer system configured to process data of a data set , wherein the data set comprises a plurality of data groups , the computer system comprises at least one processor and memory that are operable to perform the following operations : for a first data (repeating step) set having a plurality of first key-value pairs , wherein such first data set belongs to a first data group and the first key-value pairs have a first schema , partitioning the first data set into a plurality of data partitions and providing each data partition to a selected one of a plurality of mapping functions that are each user-configurable to independently output a plurality of lists of values for each of a set of keys found in such map function' ;

s corresponding data partition to form a first intermediate data set having a first set of resulting key-value pairs ;

for a second data set (elapsed time) having a plurality of second key-value pairs , wherein such second data set belongs to a second data group (repeating step) and the second key-value pairs have a second schema , partitioning the first data set into a plurality of data partitions and providing each data partition to a selected one of a plurality of mapping functions to form a second intermediate data set having a second set of resulting key-value pairs differing from the first set of resulting key-value pairs , wherein the first schema differs from the second schema ;

and reducing the first and the second intermediate data set together so as to form an output data set , wherein the reducing is accomplished by iterating on at least one key from the first and second intermediate data set , and the output data set has a different schema than the first and second schema , wherein the first and second schema and the first and second set of resulting key-value pairs have a key in common .
EP1498815A2
CLAIM 49
The method of claim 48 wherein the appliers normally post transaction data to the target database only upon receipt of a commit step or operation associated with respective transaction data , the method further comprising : (d) repeating step (first data, first data group, second data group) s (b) and (c) by adding additional appliers as the current transaction load becomes close or equal to the total transaction capacity associated with all of the previously added appliers ;
(e) upon reaching a system limit wherein no more appliers can be added and the current transaction load becomes close or equal to the maximum transaction load capacity of all of the appliers , prematurely conducting a commit step or operation on at least some of the transaction data in at least some of the appliers , thereby causing the transaction data to become posted to the target database and deleted from the respective appliers .

EP1498815A2
CLAIM 65
The method of claim 60 wherein the maximum transaction threshold limit is the maximum elapsed time (output data groups, second data set) span of a transaction .




US8190610B2

Filed: 2006-10-05     Issued: 2012-05-29

MapReduce for distributed database processing

(Original Assignee) Yahoo Inc     (Current Assignee) R2 Solutions LLC ; Altaba Inc

Ali Dasdan, Hung-Chih Yang, Ruey-Lung Hsiao
US20050187897A1

Filed: 2004-06-29     Issued: 2005-08-25

System and method for switching a data partition

(Original Assignee) Microsoft Corp     (Current Assignee) Microsoft Technology Licensing LLC

Deepak Pawar, Wey Guy, Lubor Kollar
US8190610B2
CLAIM 1
. A method of processing data (temporary storage) of a data set over a distributed system , wherein the data set comprises a plurality of data groups , the method comprising : partitioning the data of each one of the data groups into a plurality of data partitions that each have a plurality of key-value pairs and providing each data partition to a selected one of a plurality of mapping functions that are each user-configurable to independently output a plurality of lists of values for each of a set of keys found in such map function' ;

s corresponding data partition to form corresponding intermediate data for that data group (temporary storage) and identifiable to that data group , wherein the data of a first data group has a different schema than the data of a second data group and the data of the first data group is mapped differently than the data of the second data group so that different lists of values are output for the corresponding different intermediate data , wherein the different schema and corresponding different intermediate data have a key in common ;

and reducing the intermediate data for the data groups to at least one output data group , including processing the intermediate data for each data group in a manner that is defined to correspond to that data group , so as to result in a merging of the corresponding different intermediate data based on the key in common , wherein the mapping and reducing operations (comprises instructions) are performed by a distributed system .
US20050187897A1
CLAIM 6
. The method of claim 1 , wherein changing the pointers further comprises : copying target portion metadata into temporary storage (data group, first data group, second data group, output data groups, particular data group, processing data, includes data) ;
copying source portion metadata into target portion metadata ;
and copying metadata from the temporary storage into the source portion metadata .

US20050187897A1
CLAIM 20
. The system of claim 18 , wherein the software component further comprises instructions (reducing operations) to lock the first structure and second structure before performing the switch .

US8190610B2
CLAIM 2
. The method of claim 1 , wherein : the at least one output data group (temporary storage) is a plurality of output data groups (temporary storage) .
US20050187897A1
CLAIM 6
. The method of claim 1 , wherein changing the pointers further comprises : copying target portion metadata into temporary storage (data group, first data group, second data group, output data groups, particular data group, processing data, includes data) ;
copying source portion metadata into target portion metadata ;
and copying metadata from the temporary storage into the source portion metadata .

US8190610B2
CLAIM 3
. The method of claim 1 , wherein : corresponding intermediate data for a data group (temporary storage) being identifiable to the data group includes the corresponding intermediate data being stored such that metadata associated with a manner of storage is identifiable to the data group .
US20050187897A1
CLAIM 6
. The method of claim 1 , wherein changing the pointers further comprises : copying target portion metadata into temporary storage (data group, first data group, second data group, output data groups, particular data group, processing data, includes data) ;
copying source portion metadata into target portion metadata ;
and copying metadata from the temporary storage into the source portion metadata .

US8190610B2
CLAIM 4
. The method of claim 1 , wherein : corresponding intermediate data for a data group (temporary storage) being identifiable to the data group includes the corresponding intermediate data being stored in association with data that is identifiable to the data group .
US20050187897A1
CLAIM 6
. The method of claim 1 , wherein changing the pointers further comprises : copying target portion metadata into temporary storage (data group, first data group, second data group, output data groups, particular data group, processing data, includes data) ;
copying source portion metadata into target portion metadata ;
and copying metadata from the temporary storage into the source portion metadata .

US8190610B2
CLAIM 5
. The method of claim 1 , wherein : processing the intermediate data for each data group (temporary storage) in a manner that is defined to correspond to that data group includes , for each data group , employing an iterator that corresponds to that data group .
US20050187897A1
CLAIM 6
. The method of claim 1 , wherein changing the pointers further comprises : copying target portion metadata into temporary storage (data group, first data group, second data group, output data groups, particular data group, processing data, includes data) ;
copying source portion metadata into target portion metadata ;
and copying metadata from the temporary storage into the source portion metadata .

US8190610B2
CLAIM 8
. The method of claim 7 , wherein : in the partitioning step (same index) , all of the key/value pairs of the intermediate data are provided to all of the partitions .
US20050187897A1
CLAIM 16
. The method of claim 15 , wherein the data integrity test comprises verifying that the non-partitioned portion and the database have at least one of the same columns and the same index (partitioning step) es .

US8190610B2
CLAIM 11
. The method of claim 10 , wherein : processing the intermediate data for each data group (temporary storage) in a manner that is defined to correspond to that data group includes , for each data group , employing an iterator that corresponds to that data group , wherein the iterator includes providing the associated metadata to the processing of the reducing step .
US20050187897A1
CLAIM 6
. The method of claim 1 , wherein changing the pointers further comprises : copying target portion metadata into temporary storage (data group, first data group, second data group, output data groups, particular data group, processing data, includes data) ;
copying source portion metadata into target portion metadata ;
and copying metadata from the temporary storage into the source portion metadata .

US8190610B2
CLAIM 12
. The method of claim 5 , wherein : the intermediate data includes a plurality of grouped sets of key/value pairs ;

the reducing step is carried out by a plurality of reducers ;

for at least one of the reducers , the iterator corresponding to a particular data group (temporary storage) , for that reducer , operates according to a different key of a different schema than the iterator corresponding to another particular data group , for that reducer .
US20050187897A1
CLAIM 6
. The method of claim 1 , wherein changing the pointers further comprises : copying target portion metadata into temporary storage (data group, first data group, second data group, output data groups, particular data group, processing data, includes data) ;
copying source portion metadata into target portion metadata ;
and copying metadata from the temporary storage into the source portion metadata .

US8190610B2
CLAIM 13
. The method of claim 1 , wherein : the intermediate data processing step of the reducing step further comprises processing data (temporary storage) that is not intermediate data .
US20050187897A1
CLAIM 6
. The method of claim 1 , wherein changing the pointers further comprises : copying target portion metadata into temporary storage (data group, first data group, second data group, output data groups, particular data group, processing data, includes data) ;
copying source portion metadata into target portion metadata ;
and copying metadata from the temporary storage into the source portion metadata .

US8190610B2
CLAIM 14
. The method of claim 13 , wherein : the reducing step is carried out by a plurality of reducers ;

and the data that is not intermediate data , for a particular reducer , includes data (temporary storage) that is associated with another reducer .
US20050187897A1
CLAIM 6
. The method of claim 1 , wherein changing the pointers further comprises : copying target portion metadata into temporary storage (data group, first data group, second data group, output data groups, particular data group, processing data, includes data) ;
copying source portion metadata into target portion metadata ;
and copying metadata from the temporary storage into the source portion metadata .

US8190610B2
CLAIM 15
. The method of claim 13 , wherein : the reducing step is carried out by a plurality of reducers ;

and the data that is not intermediate data , for a particular reducer , includes data (temporary storage) that is associated with that reducer .
US20050187897A1
CLAIM 6
. The method of claim 1 , wherein changing the pointers further comprises : copying target portion metadata into temporary storage (data group, first data group, second data group, output data groups, particular data group, processing data, includes data) ;
copying source portion metadata into target portion metadata ;
and copying metadata from the temporary storage into the source portion metadata .

US8190610B2
CLAIM 17
. A computer system including a plurality of computing devices (first portion) , the computer system configured to process data of a data set , wherein the data set comprises a plurality of data groups , the computer system comprises at least one processor and memory that are operable to perform the following operations : partitioning the data of each one of the data groups into a plurality of data partitions that each have a plurality of key-value pairs and providing each data partition to a selected one of a plurality of mapping functions that are each user-configurable to independently output a plurality of lists of values for each of a set of keys found in such map function' ;

s corresponding data partition to form corresponding intermediate data for that data group (temporary storage) and identifiable to that data group , wherein the data of a first data group has a different schema than the data of a second data group and the data of the first data group is mapped differently than the data of the second data group so that different lists of values are output for the corresponding different intermediate data , wherein the different schema and corresponding different intermediate data have a key in common ;

and reduce the intermediate data for the data groups to at least one output data group , including processing the intermediate data for each data group in a manner that is defined to correspond to that data group so as to result in a merging of the corresponding different intermediate data based on the key in common .
US20050187897A1
CLAIM 6
. The method of claim 1 , wherein changing the pointers further comprises : copying target portion metadata into temporary storage (data group, first data group, second data group, output data groups, particular data group, processing data, includes data) ;
copying source portion metadata into target portion metadata ;
and copying metadata from the temporary storage into the source portion metadata .

US20050187897A1
CLAIM 22
. A system for swapping portions of at least one table , the system comprising : a processor having access to memory , the memory having instructions of software components ;
a first structure of data containing a first portion (computing devices) ;
a second structure of data containing a second portion ;
a software component having means to swap first and second portions , wherein no data content need be moved between the first structure and the second structure during a swap of the first portion and the second portion .

US8190610B2
CLAIM 18
. The computer system of claim 17 , wherein : the at least one output data group (temporary storage) is a plurality of output data groups (temporary storage) .
US20050187897A1
CLAIM 6
. The method of claim 1 , wherein changing the pointers further comprises : copying target portion metadata into temporary storage (data group, first data group, second data group, output data groups, particular data group, processing data, includes data) ;
copying source portion metadata into target portion metadata ;
and copying metadata from the temporary storage into the source portion metadata .

US8190610B2
CLAIM 19
. The computer system of claim 17 , wherein : corresponding intermediate data for a data group (temporary storage) being identifiable to the data group includes the corresponding intermediate data being stored such that metadata associated with the manner of storage is identifiable to the data group .
US20050187897A1
CLAIM 6
. The method of claim 1 , wherein changing the pointers further comprises : copying target portion metadata into temporary storage (data group, first data group, second data group, output data groups, particular data group, processing data, includes data) ;
copying source portion metadata into target portion metadata ;
and copying metadata from the temporary storage into the source portion metadata .

US8190610B2
CLAIM 20
. The computer system of claim 19 , wherein : corresponding intermediate data for a data group (temporary storage) being identifiable to the data group includes the computer system being configured to store the corresponding intermediate data in association with data that is identifiable to the data group .
US20050187897A1
CLAIM 6
. The method of claim 1 , wherein changing the pointers further comprises : copying target portion metadata into temporary storage (data group, first data group, second data group, output data groups, particular data group, processing data, includes data) ;
copying source portion metadata into target portion metadata ;
and copying metadata from the temporary storage into the source portion metadata .

US8190610B2
CLAIM 21
. The computer system of claim 17 , wherein : the at least one processor and memory being configured to process the intermediate data for each data group (temporary storage) in a manner that is defined to correspond to that data group includes , for each data group , the at least one processor and memory being configured to employ an iterator that corresponds to that data group .
US20050187897A1
CLAIM 6
. The method of claim 1 , wherein changing the pointers further comprises : copying target portion metadata into temporary storage (data group, first data group, second data group, output data groups, particular data group, processing data, includes data) ;
copying source portion metadata into target portion metadata ;
and copying metadata from the temporary storage into the source portion metadata .

US8190610B2
CLAIM 22
. The computer system of claim 21 , wherein : the intermediate data includes a plurality of grouped sets of key/value pairs ;

the at least one processor and memory are further operable to reduce the intermediate data via a plurality of reducers ;

for at least one of the reducers , the iterator corresponding to a particular data group (temporary storage) , for that reducer , is configured to operate according to a different key of a different schema than the iterator corresponding to another particular data group , for that reducer .
US20050187897A1
CLAIM 6
. The method of claim 1 , wherein changing the pointers further comprises : copying target portion metadata into temporary storage (data group, first data group, second data group, output data groups, particular data group, processing data, includes data) ;
copying source portion metadata into target portion metadata ;
and copying metadata from the temporary storage into the source portion metadata .

US8190610B2
CLAIM 28
. The computer system of claim 27 , wherein : the at least one processor and memory being further operable to process the intermediate data for each data group (temporary storage) in a manner that is defined to correspond to that data group includes , for each data group , the at least one processor and memory being further operable to employ an iterator that corresponds to that data group , wherein the iterator is configured to provide the associated metadata to the reducing .
US20050187897A1
CLAIM 6
. The method of claim 1 , wherein changing the pointers further comprises : copying target portion metadata into temporary storage (data group, first data group, second data group, output data groups, particular data group, processing data, includes data) ;
copying source portion metadata into target portion metadata ;
and copying metadata from the temporary storage into the source portion metadata .

US8190610B2
CLAIM 29
. The computer system of claim 17 , wherein : the intermediate data processing of the reducing further comprises processing data (temporary storage) that is not intermediate data .
US20050187897A1
CLAIM 6
. The method of claim 1 , wherein changing the pointers further comprises : copying target portion metadata into temporary storage (data group, first data group, second data group, output data groups, particular data group, processing data, includes data) ;
copying source portion metadata into target portion metadata ;
and copying metadata from the temporary storage into the source portion metadata .

US8190610B2
CLAIM 30
. The computer system of claim 29 , wherein : the at least one processor and memory are further operable to reduce the intermediate data via a plurality of reducers ;

and the data that is not intermediate data , for a particular reducer , includes data (temporary storage) that is associated with another reducer .
US20050187897A1
CLAIM 6
. The method of claim 1 , wherein changing the pointers further comprises : copying target portion metadata into temporary storage (data group, first data group, second data group, output data groups, particular data group, processing data, includes data) ;
copying source portion metadata into target portion metadata ;
and copying metadata from the temporary storage into the source portion metadata .

US8190610B2
CLAIM 31
. The computer system of claim 29 , wherein : the at least one processor and memory are further operable to reduce the intermediate data via a plurality of reducers ;

and the data that is not intermediate data , for a particular reducer , includes data (temporary storage) that is associated with that reducer .
US20050187897A1
CLAIM 6
. The method of claim 1 , wherein changing the pointers further comprises : copying target portion metadata into temporary storage (data group, first data group, second data group, output data groups, particular data group, processing data, includes data) ;
copying source portion metadata into target portion metadata ;
and copying metadata from the temporary storage into the source portion metadata .

US8190610B2
CLAIM 33
. A map-reduce method of processing data (temporary storage) from a plurality of groups having different schema over a computer system , the method comprising : for a first data set having a plurality of first key-value pairs , wherein such first data set belongs to a first data group (temporary storage) and the first key-value pairs have a first schema , partitioning the first data set into a plurality of data partitions and providing each data partition to a selected one of a plurality of mapping functions that are each user-configurable to independently output a plurality of lists of values for each of a set of keys found in such map function' ;

s corresponding data partition to form a first intermediate data set having a first set of resulting key-value pairs ;

for a second data set having a plurality of second key-value pairs , wherein such second data set belongs to a second data group and the second key-value pairs have a second schema , partitioning the first data set into a plurality of data partitions and providing each data partition to a selected one of a plurality of mapping functions to form a second intermediate data set having a second set of resulting key-value pairs differing from the first set of resulting key-value pairs , wherein the first schema differs from the second schema , wherein the first and second schema and the first and second set of resulting key-value pairs have a key in common ;

and reducing the first and the second intermediate data set together so as to form an output data set , wherein the reducing is accomplished by iterating on at least one key from the first and second intermediate data set , and the output data set has a different schema than the first and second schema , wherein the mapping and reducing operations (comprises instructions) are performed by a distributed system .
US20050187897A1
CLAIM 6
. The method of claim 1 , wherein changing the pointers further comprises : copying target portion metadata into temporary storage (data group, first data group, second data group, output data groups, particular data group, processing data, includes data) ;
copying source portion metadata into target portion metadata ;
and copying metadata from the temporary storage into the source portion metadata .

US20050187897A1
CLAIM 20
. The system of claim 18 , wherein the software component further comprises instructions (reducing operations) to lock the first structure and second structure before performing the switch .

US8190610B2
CLAIM 40
. A computer system including a plurality of computing devices (first portion) , the computer system configured to process data of a data set , wherein the data set comprises a plurality of data groups , the computer system comprises at least one processor and memory that are operable to perform the following operations : for a first data set having a plurality of first key-value pairs , wherein such first data set belongs to a first data group (temporary storage) and the first key-value pairs have a first schema , partitioning the first data set into a plurality of data partitions and providing each data partition to a selected one of a plurality of mapping functions that are each user-configurable to independently output a plurality of lists of values for each of a set of keys found in such map function' ;

s corresponding data partition to form a first intermediate data set having a first set of resulting key-value pairs ;

for a second data set having a plurality of second key-value pairs , wherein such second data set belongs to a second data group and the second key-value pairs have a second schema , partitioning the first data set into a plurality of data partitions and providing each data partition to a selected one of a plurality of mapping functions to form a second intermediate data set having a second set of resulting key-value pairs differing from the first set of resulting key-value pairs , wherein the first schema differs from the second schema ;

and reducing the first and the second intermediate data set together so as to form an output data set , wherein the reducing is accomplished by iterating on at least one key from the first and second intermediate data set , and the output data set has a different schema than the first and second schema , wherein the first and second schema and the first and second set of resulting key-value pairs have a key in common .
US20050187897A1
CLAIM 6
. The method of claim 1 , wherein changing the pointers further comprises : copying target portion metadata into temporary storage (data group, first data group, second data group, output data groups, particular data group, processing data, includes data) ;
copying source portion metadata into target portion metadata ;
and copying metadata from the temporary storage into the source portion metadata .

US20050187897A1
CLAIM 22
. A system for swapping portions of at least one table , the system comprising : a processor having access to memory , the memory having instructions of software components ;
a first structure of data containing a first portion (computing devices) ;
a second structure of data containing a second portion ;
a software component having means to swap first and second portions , wherein no data content need be moved between the first structure and the second structure during a swap of the first portion and the second portion .




US8190610B2

Filed: 2006-10-05     Issued: 2012-05-29

MapReduce for distributed database processing

(Original Assignee) Yahoo Inc     (Current Assignee) R2 Solutions LLC ; Altaba Inc

Ali Dasdan, Hung-Chih Yang, Ruey-Lung Hsiao
JP2005018751A

Filed: 2004-06-03     Issued: 2005-01-20

測度間の関係を表現及び計算するシステム及び方法

(Original Assignee) Microsoft Corp; マイクロソフト コーポレーション     

Amir Netz, Cristian Petculescu, Richard Tkachuk, ネッツ アミル, ペトカレスキュ クリスチャン, トカチュク リチャード
US8190610B2
CLAIM 1
. A method of processing data of a data set over a distributed system , wherein the data set comprises a plurality of data groups , the method comprising : partitioning the data of each one of the data groups into a plurality of data partitions that each have a plurality of key-value pairs and providing each data partition to a selected one of a plurality of mapping functions that are each user-configurable to independently output a plurality of lists of values for each of a set of keys found in such map function' ;

s corresponding data partition to form corresponding intermediate data for that data group and identifiable to that data group , wherein the data of a first data group has a different schema (メモリ) than the data of a second data group and the data of the first data group is mapped differently than the data of the second data group so that different lists of values are output for the corresponding different intermediate data , wherein the different schema and corresponding different intermediate data have a key in common ;

and reducing the intermediate data for the data groups to at least one output data group , including processing the intermediate data for each data group in a manner that is defined to correspond to that data group , so as to result in a merging of the corresponding different intermediate data based on the key in common , wherein the mapping and reducing operations are performed by a distributed system .
JP2005018751A
CLAIM 13
前記第1のキャッシュ及び前記第2のキャッシュを記憶するメモリ (different schema) を更に備えることを特徴とする請求項8記載の方法。

US8190610B2
CLAIM 12
. The method of claim 5 , wherein : the intermediate data includes a plurality of grouped sets of key/value pairs ;

the reducing step is carried out by a plurality of reducers ;

for at least one of the reducers , the iterator corresponding to a particular data group , for that reducer , operates according to a different key of a different schema (メモリ) than the iterator corresponding to another particular data group , for that reducer .
JP2005018751A
CLAIM 13
前記第1のキャッシュ及び前記第2のキャッシュを記憶するメモリ (different schema) を更に備えることを特徴とする請求項8記載の方法。

US8190610B2
CLAIM 17
. A computer system (可読媒体) including a plurality of computing devices , the computer system configured to process data of a data set , wherein the data set comprises a plurality of data groups , the computer system comprises at least one processor and memory that are operable to perform the following operations : partitioning the data of each one of the data groups into a plurality of data partitions that each have a plurality of key-value pairs and providing each data partition to a selected one of a plurality of mapping functions that are each user-configurable to independently output a plurality of lists of values for each of a set of keys found in such map function' ;

s corresponding data partition to form corresponding intermediate data for that data group and identifiable to that data group , wherein the data of a first data group has a different schema (メモリ) than the data of a second data group and the data of the first data group is mapped differently than the data of the second data group so that different lists of values are output for the corresponding different intermediate data , wherein the different schema and corresponding different intermediate data have a key in common ;

and reduce the intermediate data for the data groups to at least one output data group , including processing the intermediate data for each data group in a manner that is defined to correspond to that data group so as to result in a merging of the corresponding different intermediate data based on the key in common .
JP2005018751A
CLAIM 13
前記第1のキャッシュ及び前記第2のキャッシュを記憶するメモリ (different schema) を更に備えることを特徴とする請求項8記載の方法。

JP2005018751A
CLAIM 14
選択された属性範囲上の測度表現式を計算するコンピュータ可読媒体 (computer system) であって、前記測度表現式は第1の測度と第2の測度との間の関係を含み、前記第1の測度は第1のデータ型に対応し、前記第2の測度は第2のデータ型に対応し、前記関係は算術演算によって定義され、前記コンピュータ可読媒体は、 前記第1のデータ型に対応し、選択された属性範囲上の第1の測度に関するデータを含む第1のキャッシュを検索するステップ、 前記第1のキャッシュから、前記第1のデータ型と前記第2のデータ型に共通する選択された属性範囲上の前記第1の測度に関するデータを含む第1のインデックスを生成するステップ、 前記第2のデータ型に対応し、前記選択された属性範囲上の前記第2の測度に関するデータを含む第2のキャッシュを検索するステップ、 前記第2のキャッシュから、前記第1のデータ型と前記第2のデータ型に共通する前記選択された属性範囲上の前記第2の測度に関するデータを含む第2のインデックスを生成するステップ、 前記第1のインデックスからの前記第1の測度に関するデータ及び前記第2のインデックスからの前記第2の測度に関するデータに対して算術演算を実行し、結果データを得るステップ、 前記第1のデータ型と前記第2のデータ型に共通する前記選択された属性範囲上に前記結果データを集めるステップ を実行するコンピュータ実行可能命令を備えることを特徴とする方法。

US8190610B2
CLAIM 18
. The computer system (可読媒体) of claim 17 , wherein : the at least one output data group is a plurality of output data groups .
JP2005018751A
CLAIM 14
選択された属性範囲上の測度表現式を計算するコンピュータ可読媒体 (computer system) であって、前記測度表現式は第1の測度と第2の測度との間の関係を含み、前記第1の測度は第1のデータ型に対応し、前記第2の測度は第2のデータ型に対応し、前記関係は算術演算によって定義され、前記コンピュータ可読媒体は、 前記第1のデータ型に対応し、選択された属性範囲上の第1の測度に関するデータを含む第1のキャッシュを検索するステップ、 前記第1のキャッシュから、前記第1のデータ型と前記第2のデータ型に共通する選択された属性範囲上の前記第1の測度に関するデータを含む第1のインデックスを生成するステップ、 前記第2のデータ型に対応し、前記選択された属性範囲上の前記第2の測度に関するデータを含む第2のキャッシュを検索するステップ、 前記第2のキャッシュから、前記第1のデータ型と前記第2のデータ型に共通する前記選択された属性範囲上の前記第2の測度に関するデータを含む第2のインデックスを生成するステップ、 前記第1のインデックスからの前記第1の測度に関するデータ及び前記第2のインデックスからの前記第2の測度に関するデータに対して算術演算を実行し、結果データを得るステップ、 前記第1のデータ型と前記第2のデータ型に共通する前記選択された属性範囲上に前記結果データを集めるステップ を実行するコンピュータ実行可能命令を備えることを特徴とする方法。

US8190610B2
CLAIM 19
. The computer system (可読媒体) of claim 17 , wherein : corresponding intermediate data for a data group being identifiable to the data group includes the corresponding intermediate data being stored such that metadata associated with the manner of storage is identifiable to the data group .
JP2005018751A
CLAIM 14
選択された属性範囲上の測度表現式を計算するコンピュータ可読媒体 (computer system) であって、前記測度表現式は第1の測度と第2の測度との間の関係を含み、前記第1の測度は第1のデータ型に対応し、前記第2の測度は第2のデータ型に対応し、前記関係は算術演算によって定義され、前記コンピュータ可読媒体は、 前記第1のデータ型に対応し、選択された属性範囲上の第1の測度に関するデータを含む第1のキャッシュを検索するステップ、 前記第1のキャッシュから、前記第1のデータ型と前記第2のデータ型に共通する選択された属性範囲上の前記第1の測度に関するデータを含む第1のインデックスを生成するステップ、 前記第2のデータ型に対応し、前記選択された属性範囲上の前記第2の測度に関するデータを含む第2のキャッシュを検索するステップ、 前記第2のキャッシュから、前記第1のデータ型と前記第2のデータ型に共通する前記選択された属性範囲上の前記第2の測度に関するデータを含む第2のインデックスを生成するステップ、 前記第1のインデックスからの前記第1の測度に関するデータ及び前記第2のインデックスからの前記第2の測度に関するデータに対して算術演算を実行し、結果データを得るステップ、 前記第1のデータ型と前記第2のデータ型に共通する前記選択された属性範囲上に前記結果データを集めるステップ を実行するコンピュータ実行可能命令を備えることを特徴とする方法。

US8190610B2
CLAIM 20
. The computer system (可読媒体) of claim 19 , wherein : corresponding intermediate data for a data group being identifiable to the data group includes the computer system being configured to store the corresponding intermediate data in association with data that is identifiable to the data group .
JP2005018751A
CLAIM 14
選択された属性範囲上の測度表現式を計算するコンピュータ可読媒体 (computer system) であって、前記測度表現式は第1の測度と第2の測度との間の関係を含み、前記第1の測度は第1のデータ型に対応し、前記第2の測度は第2のデータ型に対応し、前記関係は算術演算によって定義され、前記コンピュータ可読媒体は、 前記第1のデータ型に対応し、選択された属性範囲上の第1の測度に関するデータを含む第1のキャッシュを検索するステップ、 前記第1のキャッシュから、前記第1のデータ型と前記第2のデータ型に共通する選択された属性範囲上の前記第1の測度に関するデータを含む第1のインデックスを生成するステップ、 前記第2のデータ型に対応し、前記選択された属性範囲上の前記第2の測度に関するデータを含む第2のキャッシュを検索するステップ、 前記第2のキャッシュから、前記第1のデータ型と前記第2のデータ型に共通する前記選択された属性範囲上の前記第2の測度に関するデータを含む第2のインデックスを生成するステップ、 前記第1のインデックスからの前記第1の測度に関するデータ及び前記第2のインデックスからの前記第2の測度に関するデータに対して算術演算を実行し、結果データを得るステップ、 前記第1のデータ型と前記第2のデータ型に共通する前記選択された属性範囲上に前記結果データを集めるステップ を実行するコンピュータ実行可能命令を備えることを特徴とする方法。

US8190610B2
CLAIM 21
. The computer system (可読媒体) of claim 17 , wherein : the at least one processor and memory being configured to process the intermediate data for each data group in a manner that is defined to correspond to that data group includes , for each data group , the at least one processor and memory being configured to employ an iterator that corresponds to that data group .
JP2005018751A
CLAIM 14
選択された属性範囲上の測度表現式を計算するコンピュータ可読媒体 (computer system) であって、前記測度表現式は第1の測度と第2の測度との間の関係を含み、前記第1の測度は第1のデータ型に対応し、前記第2の測度は第2のデータ型に対応し、前記関係は算術演算によって定義され、前記コンピュータ可読媒体は、 前記第1のデータ型に対応し、選択された属性範囲上の第1の測度に関するデータを含む第1のキャッシュを検索するステップ、 前記第1のキャッシュから、前記第1のデータ型と前記第2のデータ型に共通する選択された属性範囲上の前記第1の測度に関するデータを含む第1のインデックスを生成するステップ、 前記第2のデータ型に対応し、前記選択された属性範囲上の前記第2の測度に関するデータを含む第2のキャッシュを検索するステップ、 前記第2のキャッシュから、前記第1のデータ型と前記第2のデータ型に共通する前記選択された属性範囲上の前記第2の測度に関するデータを含む第2のインデックスを生成するステップ、 前記第1のインデックスからの前記第1の測度に関するデータ及び前記第2のインデックスからの前記第2の測度に関するデータに対して算術演算を実行し、結果データを得るステップ、 前記第1のデータ型と前記第2のデータ型に共通する前記選択された属性範囲上に前記結果データを集めるステップ を実行するコンピュータ実行可能命令を備えることを特徴とする方法。

US8190610B2
CLAIM 22
. The computer system (可読媒体) of claim 21 , wherein : the intermediate data includes a plurality of grouped sets of key/value pairs ;

the at least one processor and memory are further operable to reduce the intermediate data via a plurality of reducers ;

for at least one of the reducers , the iterator corresponding to a particular data group , for that reducer , is configured to operate according to a different key of a different schema (メモリ) than the iterator corresponding to another particular data group , for that reducer .
JP2005018751A
CLAIM 13
前記第1のキャッシュ及び前記第2のキャッシュを記憶するメモリ (different schema) を更に備えることを特徴とする請求項8記載の方法。

JP2005018751A
CLAIM 14
選択された属性範囲上の測度表現式を計算するコンピュータ可読媒体 (computer system) であって、前記測度表現式は第1の測度と第2の測度との間の関係を含み、前記第1の測度は第1のデータ型に対応し、前記第2の測度は第2のデータ型に対応し、前記関係は算術演算によって定義され、前記コンピュータ可読媒体は、 前記第1のデータ型に対応し、選択された属性範囲上の第1の測度に関するデータを含む第1のキャッシュを検索するステップ、 前記第1のキャッシュから、前記第1のデータ型と前記第2のデータ型に共通する選択された属性範囲上の前記第1の測度に関するデータを含む第1のインデックスを生成するステップ、 前記第2のデータ型に対応し、前記選択された属性範囲上の前記第2の測度に関するデータを含む第2のキャッシュを検索するステップ、 前記第2のキャッシュから、前記第1のデータ型と前記第2のデータ型に共通する前記選択された属性範囲上の前記第2の測度に関するデータを含む第2のインデックスを生成するステップ、 前記第1のインデックスからの前記第1の測度に関するデータ及び前記第2のインデックスからの前記第2の測度に関するデータに対して算術演算を実行し、結果データを得るステップ、 前記第1のデータ型と前記第2のデータ型に共通する前記選択された属性範囲上に前記結果データを集めるステップ を実行するコンピュータ実行可能命令を備えることを特徴とする方法。

US8190610B2
CLAIM 23
. The computer system (可読媒体) of claim 17 , wherein : the intermediate data includes a plurality of grouped sets of key/value pairs ;

the at least one processor and memory are further operable to reduce the intermediate data via a plurality of reducers ;

and the at least one processor and memory are further operable to : partition the intermediate data into a plurality of partitions , to provide each key/value pair of the intermediate data to a separate one of the partitions ;

and provide the intermediate data of each partition to a separate one of the reducers .
JP2005018751A
CLAIM 14
選択された属性範囲上の測度表現式を計算するコンピュータ可読媒体 (computer system) であって、前記測度表現式は第1の測度と第2の測度との間の関係を含み、前記第1の測度は第1のデータ型に対応し、前記第2の測度は第2のデータ型に対応し、前記関係は算術演算によって定義され、前記コンピュータ可読媒体は、 前記第1のデータ型に対応し、選択された属性範囲上の第1の測度に関するデータを含む第1のキャッシュを検索するステップ、 前記第1のキャッシュから、前記第1のデータ型と前記第2のデータ型に共通する選択された属性範囲上の前記第1の測度に関するデータを含む第1のインデックスを生成するステップ、 前記第2のデータ型に対応し、前記選択された属性範囲上の前記第2の測度に関するデータを含む第2のキャッシュを検索するステップ、 前記第2のキャッシュから、前記第1のデータ型と前記第2のデータ型に共通する前記選択された属性範囲上の前記第2の測度に関するデータを含む第2のインデックスを生成するステップ、 前記第1のインデックスからの前記第1の測度に関するデータ及び前記第2のインデックスからの前記第2の測度に関するデータに対して算術演算を実行し、結果データを得るステップ、 前記第1のデータ型と前記第2のデータ型に共通する前記選択された属性範囲上に前記結果データを集めるステップ を実行するコンピュータ実行可能命令を備えることを特徴とする方法。

US8190610B2
CLAIM 24
. The computer system (可読媒体) of claim 17 , wherein : the intermediate data includes a plurality of grouped sets of key/value pairs ;

the at least one processor and memory are further operable to reduce the intermediate data via a plurality of reducers ;

and the at least one processor and memory are further operable to : partition the intermediate data into a plurality of partitions , to provide at least some of the key/value pairs of the intermediate data to more than one of the partitions ;

and provide the intermediate data of each partition to a separate one of the reducers .
JP2005018751A
CLAIM 14
選択された属性範囲上の測度表現式を計算するコンピュータ可読媒体 (computer system) であって、前記測度表現式は第1の測度と第2の測度との間の関係を含み、前記第1の測度は第1のデータ型に対応し、前記第2の測度は第2のデータ型に対応し、前記関係は算術演算によって定義され、前記コンピュータ可読媒体は、 前記第1のデータ型に対応し、選択された属性範囲上の第1の測度に関するデータを含む第1のキャッシュを検索するステップ、 前記第1のキャッシュから、前記第1のデータ型と前記第2のデータ型に共通する選択された属性範囲上の前記第1の測度に関するデータを含む第1のインデックスを生成するステップ、 前記第2のデータ型に対応し、前記選択された属性範囲上の前記第2の測度に関するデータを含む第2のキャッシュを検索するステップ、 前記第2のキャッシュから、前記第1のデータ型と前記第2のデータ型に共通する前記選択された属性範囲上の前記第2の測度に関するデータを含む第2のインデックスを生成するステップ、 前記第1のインデックスからの前記第1の測度に関するデータ及び前記第2のインデックスからの前記第2の測度に関するデータに対して算術演算を実行し、結果データを得るステップ、 前記第1のデータ型と前記第2のデータ型に共通する前記選択された属性範囲上に前記結果データを集めるステップ を実行するコンピュータ実行可能命令を備えることを特徴とする方法。

US8190610B2
CLAIM 25
. The computer system (可読媒体) of claim 24 , wherein : the at least one processor and memory are further operable to , in partitioning , to provide all of the key/value pairs of the intermediate data to all of the partitions .
JP2005018751A
CLAIM 14
選択された属性範囲上の測度表現式を計算するコンピュータ可読媒体 (computer system) であって、前記測度表現式は第1の測度と第2の測度との間の関係を含み、前記第1の測度は第1のデータ型に対応し、前記第2の測度は第2のデータ型に対応し、前記関係は算術演算によって定義され、前記コンピュータ可読媒体は、 前記第1のデータ型に対応し、選択された属性範囲上の第1の測度に関するデータを含む第1のキャッシュを検索するステップ、 前記第1のキャッシュから、前記第1のデータ型と前記第2のデータ型に共通する選択された属性範囲上の前記第1の測度に関するデータを含む第1のインデックスを生成するステップ、 前記第2のデータ型に対応し、前記選択された属性範囲上の前記第2の測度に関するデータを含む第2のキャッシュを検索するステップ、 前記第2のキャッシュから、前記第1のデータ型と前記第2のデータ型に共通する前記選択された属性範囲上の前記第2の測度に関するデータを含む第2のインデックスを生成するステップ、 前記第1のインデックスからの前記第1の測度に関するデータ及び前記第2のインデックスからの前記第2の測度に関するデータに対して算術演算を実行し、結果データを得るステップ、 前記第1のデータ型と前記第2のデータ型に共通する前記選択された属性範囲上に前記結果データを集めるステップ を実行するコンピュータ実行可能命令を備えることを特徴とする方法。

US8190610B2
CLAIM 26
. The computer system (可読媒体) of claim 24 , wherein : at least some of the reducers include a sort , group-by-key and combine task ;

the at least one processor and memory are further operable to generate and provide metadata for at least some of the mapping , partitioning , combining , grouping and sorting .
JP2005018751A
CLAIM 14
選択された属性範囲上の測度表現式を計算するコンピュータ可読媒体 (computer system) であって、前記測度表現式は第1の測度と第2の測度との間の関係を含み、前記第1の測度は第1のデータ型に対応し、前記第2の測度は第2のデータ型に対応し、前記関係は算術演算によって定義され、前記コンピュータ可読媒体は、 前記第1のデータ型に対応し、選択された属性範囲上の第1の測度に関するデータを含む第1のキャッシュを検索するステップ、 前記第1のキャッシュから、前記第1のデータ型と前記第2のデータ型に共通する選択された属性範囲上の前記第1の測度に関するデータを含む第1のインデックスを生成するステップ、 前記第2のデータ型に対応し、前記選択された属性範囲上の前記第2の測度に関するデータを含む第2のキャッシュを検索するステップ、 前記第2のキャッシュから、前記第1のデータ型と前記第2のデータ型に共通する前記選択された属性範囲上の前記第2の測度に関するデータを含む第2のインデックスを生成するステップ、 前記第1のインデックスからの前記第1の測度に関するデータ及び前記第2のインデックスからの前記第2の測度に関するデータに対して算術演算を実行し、結果データを得るステップ、 前記第1のデータ型と前記第2のデータ型に共通する前記選択された属性範囲上に前記結果データを集めるステップ を実行するコンピュータ実行可能命令を備えることを特徴とする方法。

US8190610B2
CLAIM 27
. The computer system (可読媒体) of claim 26 , wherein : the reducing includes processing the metadata .
JP2005018751A
CLAIM 14
選択された属性範囲上の測度表現式を計算するコンピュータ可読媒体 (computer system) であって、前記測度表現式は第1の測度と第2の測度との間の関係を含み、前記第1の測度は第1のデータ型に対応し、前記第2の測度は第2のデータ型に対応し、前記関係は算術演算によって定義され、前記コンピュータ可読媒体は、 前記第1のデータ型に対応し、選択された属性範囲上の第1の測度に関するデータを含む第1のキャッシュを検索するステップ、 前記第1のキャッシュから、前記第1のデータ型と前記第2のデータ型に共通する選択された属性範囲上の前記第1の測度に関するデータを含む第1のインデックスを生成するステップ、 前記第2のデータ型に対応し、前記選択された属性範囲上の前記第2の測度に関するデータを含む第2のキャッシュを検索するステップ、 前記第2のキャッシュから、前記第1のデータ型と前記第2のデータ型に共通する前記選択された属性範囲上の前記第2の測度に関するデータを含む第2のインデックスを生成するステップ、 前記第1のインデックスからの前記第1の測度に関するデータ及び前記第2のインデックスからの前記第2の測度に関するデータに対して算術演算を実行し、結果データを得るステップ、 前記第1のデータ型と前記第2のデータ型に共通する前記選択された属性範囲上に前記結果データを集めるステップ を実行するコンピュータ実行可能命令を備えることを特徴とする方法。

US8190610B2
CLAIM 28
. The computer system (可読媒体) of claim 27 , wherein : the at least one processor and memory being further operable to process the intermediate data for each data group in a manner that is defined to correspond to that data group includes , for each data group , the at least one processor and memory being further operable to employ an iterator that corresponds to that data group , wherein the iterator is configured to provide the associated metadata to the reducing .
JP2005018751A
CLAIM 14
選択された属性範囲上の測度表現式を計算するコンピュータ可読媒体 (computer system) であって、前記測度表現式は第1の測度と第2の測度との間の関係を含み、前記第1の測度は第1のデータ型に対応し、前記第2の測度は第2のデータ型に対応し、前記関係は算術演算によって定義され、前記コンピュータ可読媒体は、 前記第1のデータ型に対応し、選択された属性範囲上の第1の測度に関するデータを含む第1のキャッシュを検索するステップ、 前記第1のキャッシュから、前記第1のデータ型と前記第2のデータ型に共通する選択された属性範囲上の前記第1の測度に関するデータを含む第1のインデックスを生成するステップ、 前記第2のデータ型に対応し、前記選択された属性範囲上の前記第2の測度に関するデータを含む第2のキャッシュを検索するステップ、 前記第2のキャッシュから、前記第1のデータ型と前記第2のデータ型に共通する前記選択された属性範囲上の前記第2の測度に関するデータを含む第2のインデックスを生成するステップ、 前記第1のインデックスからの前記第1の測度に関するデータ及び前記第2のインデックスからの前記第2の測度に関するデータに対して算術演算を実行し、結果データを得るステップ、 前記第1のデータ型と前記第2のデータ型に共通する前記選択された属性範囲上に前記結果データを集めるステップ を実行するコンピュータ実行可能命令を備えることを特徴とする方法。

US8190610B2
CLAIM 29
. The computer system (可読媒体) of claim 17 , wherein : the intermediate data processing of the reducing further comprises processing data that is not intermediate data .
JP2005018751A
CLAIM 14
選択された属性範囲上の測度表現式を計算するコンピュータ可読媒体 (computer system) であって、前記測度表現式は第1の測度と第2の測度との間の関係を含み、前記第1の測度は第1のデータ型に対応し、前記第2の測度は第2のデータ型に対応し、前記関係は算術演算によって定義され、前記コンピュータ可読媒体は、 前記第1のデータ型に対応し、選択された属性範囲上の第1の測度に関するデータを含む第1のキャッシュを検索するステップ、 前記第1のキャッシュから、前記第1のデータ型と前記第2のデータ型に共通する選択された属性範囲上の前記第1の測度に関するデータを含む第1のインデックスを生成するステップ、 前記第2のデータ型に対応し、前記選択された属性範囲上の前記第2の測度に関するデータを含む第2のキャッシュを検索するステップ、 前記第2のキャッシュから、前記第1のデータ型と前記第2のデータ型に共通する前記選択された属性範囲上の前記第2の測度に関するデータを含む第2のインデックスを生成するステップ、 前記第1のインデックスからの前記第1の測度に関するデータ及び前記第2のインデックスからの前記第2の測度に関するデータに対して算術演算を実行し、結果データを得るステップ、 前記第1のデータ型と前記第2のデータ型に共通する前記選択された属性範囲上に前記結果データを集めるステップ を実行するコンピュータ実行可能命令を備えることを特徴とする方法。

US8190610B2
CLAIM 30
. The computer system (可読媒体) of claim 29 , wherein : the at least one processor and memory are further operable to reduce the intermediate data via a plurality of reducers ;

and the data that is not intermediate data , for a particular reducer , includes data that is associated with another reducer .
JP2005018751A
CLAIM 14
選択された属性範囲上の測度表現式を計算するコンピュータ可読媒体 (computer system) であって、前記測度表現式は第1の測度と第2の測度との間の関係を含み、前記第1の測度は第1のデータ型に対応し、前記第2の測度は第2のデータ型に対応し、前記関係は算術演算によって定義され、前記コンピュータ可読媒体は、 前記第1のデータ型に対応し、選択された属性範囲上の第1の測度に関するデータを含む第1のキャッシュを検索するステップ、 前記第1のキャッシュから、前記第1のデータ型と前記第2のデータ型に共通する選択された属性範囲上の前記第1の測度に関するデータを含む第1のインデックスを生成するステップ、 前記第2のデータ型に対応し、前記選択された属性範囲上の前記第2の測度に関するデータを含む第2のキャッシュを検索するステップ、 前記第2のキャッシュから、前記第1のデータ型と前記第2のデータ型に共通する前記選択された属性範囲上の前記第2の測度に関するデータを含む第2のインデックスを生成するステップ、 前記第1のインデックスからの前記第1の測度に関するデータ及び前記第2のインデックスからの前記第2の測度に関するデータに対して算術演算を実行し、結果データを得るステップ、 前記第1のデータ型と前記第2のデータ型に共通する前記選択された属性範囲上に前記結果データを集めるステップ を実行するコンピュータ実行可能命令を備えることを特徴とする方法。

US8190610B2
CLAIM 31
. The computer system (可読媒体) of claim 29 , wherein : the at least one processor and memory are further operable to reduce the intermediate data via a plurality of reducers ;

and the data that is not intermediate data , for a particular reducer , includes data that is associated with that reducer .
JP2005018751A
CLAIM 14
選択された属性範囲上の測度表現式を計算するコンピュータ可読媒体 (computer system) であって、前記測度表現式は第1の測度と第2の測度との間の関係を含み、前記第1の測度は第1のデータ型に対応し、前記第2の測度は第2のデータ型に対応し、前記関係は算術演算によって定義され、前記コンピュータ可読媒体は、 前記第1のデータ型に対応し、選択された属性範囲上の第1の測度に関するデータを含む第1のキャッシュを検索するステップ、 前記第1のキャッシュから、前記第1のデータ型と前記第2のデータ型に共通する選択された属性範囲上の前記第1の測度に関するデータを含む第1のインデックスを生成するステップ、 前記第2のデータ型に対応し、前記選択された属性範囲上の前記第2の測度に関するデータを含む第2のキャッシュを検索するステップ、 前記第2のキャッシュから、前記第1のデータ型と前記第2のデータ型に共通する前記選択された属性範囲上の前記第2の測度に関するデータを含む第2のインデックスを生成するステップ、 前記第1のインデックスからの前記第1の測度に関するデータ及び前記第2のインデックスからの前記第2の測度に関するデータに対して算術演算を実行し、結果データを得るステップ、 前記第1のデータ型と前記第2のデータ型に共通する前記選択された属性範囲上に前記結果データを集めるステップ を実行するコンピュータ実行可能命令を備えることを特徴とする方法。

US8190610B2
CLAIM 32
. The computer system (可読媒体) of claim 17 , wherein the at least one processor and memory are further operable to reduce by relating the data among the plurality of data groups .
JP2005018751A
CLAIM 14
選択された属性範囲上の測度表現式を計算するコンピュータ可読媒体 (computer system) であって、前記測度表現式は第1の測度と第2の測度との間の関係を含み、前記第1の測度は第1のデータ型に対応し、前記第2の測度は第2のデータ型に対応し、前記関係は算術演算によって定義され、前記コンピュータ可読媒体は、 前記第1のデータ型に対応し、選択された属性範囲上の第1の測度に関するデータを含む第1のキャッシュを検索するステップ、 前記第1のキャッシュから、前記第1のデータ型と前記第2のデータ型に共通する選択された属性範囲上の前記第1の測度に関するデータを含む第1のインデックスを生成するステップ、 前記第2のデータ型に対応し、前記選択された属性範囲上の前記第2の測度に関するデータを含む第2のキャッシュを検索するステップ、 前記第2のキャッシュから、前記第1のデータ型と前記第2のデータ型に共通する前記選択された属性範囲上の前記第2の測度に関するデータを含む第2のインデックスを生成するステップ、 前記第1のインデックスからの前記第1の測度に関するデータ及び前記第2のインデックスからの前記第2の測度に関するデータに対して算術演算を実行し、結果データを得るステップ、 前記第1のデータ型と前記第2のデータ型に共通する前記選択された属性範囲上に前記結果データを集めるステップ を実行するコンピュータ実行可能命令を備えることを特徴とする方法。

US8190610B2
CLAIM 33
. A map-reduce method of processing data from a plurality of groups having different schema (メモリ) over a computer system (可読媒体) , the method comprising : for a first data set having a plurality of first key-value pairs , wherein such first data set belongs to a first data group and the first key-value pairs have a first schema , partitioning the first data set into a plurality of data partitions and providing each data partition to a selected one of a plurality of mapping functions that are each user-configurable to independently output a plurality of lists of values for each of a set of keys found in such map function' ;

s corresponding data partition to form a first intermediate data set having a first set of resulting key-value pairs ;

for a second data set having a plurality of second key-value pairs , wherein such second data set belongs to a second data group and the second key-value pairs have a second schema , partitioning the first data set into a plurality of data partitions and providing each data partition to a selected one of a plurality of mapping functions to form a second intermediate data set having a second set of resulting key-value pairs differing from the first set of resulting key-value pairs , wherein the first schema differs from the second schema , wherein the first and second schema and the first and second set of resulting key-value pairs have a key in common ;

and reducing the first and the second intermediate data set together so as to form an output data set , wherein the reducing is accomplished by iterating on at least one key from the first and second intermediate data set , and the output data set has a different schema than the first and second schema , wherein the mapping and reducing operations are performed by a distributed system .
JP2005018751A
CLAIM 13
前記第1のキャッシュ及び前記第2のキャッシュを記憶するメモリ (different schema) を更に備えることを特徴とする請求項8記載の方法。

JP2005018751A
CLAIM 14
選択された属性範囲上の測度表現式を計算するコンピュータ可読媒体 (computer system) であって、前記測度表現式は第1の測度と第2の測度との間の関係を含み、前記第1の測度は第1のデータ型に対応し、前記第2の測度は第2のデータ型に対応し、前記関係は算術演算によって定義され、前記コンピュータ可読媒体は、 前記第1のデータ型に対応し、選択された属性範囲上の第1の測度に関するデータを含む第1のキャッシュを検索するステップ、 前記第1のキャッシュから、前記第1のデータ型と前記第2のデータ型に共通する選択された属性範囲上の前記第1の測度に関するデータを含む第1のインデックスを生成するステップ、 前記第2のデータ型に対応し、前記選択された属性範囲上の前記第2の測度に関するデータを含む第2のキャッシュを検索するステップ、 前記第2のキャッシュから、前記第1のデータ型と前記第2のデータ型に共通する前記選択された属性範囲上の前記第2の測度に関するデータを含む第2のインデックスを生成するステップ、 前記第1のインデックスからの前記第1の測度に関するデータ及び前記第2のインデックスからの前記第2の測度に関するデータに対して算術演算を実行し、結果データを得るステップ、 前記第1のデータ型と前記第2のデータ型に共通する前記選択された属性範囲上に前記結果データを集めるステップ を実行するコンピュータ実行可能命令を備えることを特徴とする方法。

US8190610B2
CLAIM 40
. A computer system (可読媒体) including a plurality of computing devices , the computer system configured to process data of a data set , wherein the data set comprises a plurality of data groups , the computer system comprises at least one processor and memory that are operable to perform the following operations : for a first data set having a plurality of first key-value pairs , wherein such first data set belongs to a first data group and the first key-value pairs have a first schema , partitioning the first data set into a plurality of data partitions and providing each data partition to a selected one of a plurality of mapping functions that are each user-configurable to independently output a plurality of lists of values for each of a set of keys found in such map function' ;

s corresponding data partition to form a first intermediate data set having a first set of resulting key-value pairs ;

for a second data set having a plurality of second key-value pairs , wherein such second data set belongs to a second data group and the second key-value pairs have a second schema , partitioning the first data set into a plurality of data partitions and providing each data partition to a selected one of a plurality of mapping functions to form a second intermediate data set having a second set of resulting key-value pairs differing from the first set of resulting key-value pairs , wherein the first schema differs from the second schema ;

and reducing the first and the second intermediate data set together so as to form an output data set , wherein the reducing is accomplished by iterating on at least one key from the first and second intermediate data set , and the output data set has a different schema (メモリ) than the first and second schema , wherein the first and second schema and the first and second set of resulting key-value pairs have a key in common .
JP2005018751A
CLAIM 13
前記第1のキャッシュ及び前記第2のキャッシュを記憶するメモリ (different schema) を更に備えることを特徴とする請求項8記載の方法。

JP2005018751A
CLAIM 14
選択された属性範囲上の測度表現式を計算するコンピュータ可読媒体 (computer system) であって、前記測度表現式は第1の測度と第2の測度との間の関係を含み、前記第1の測度は第1のデータ型に対応し、前記第2の測度は第2のデータ型に対応し、前記関係は算術演算によって定義され、前記コンピュータ可読媒体は、 前記第1のデータ型に対応し、選択された属性範囲上の第1の測度に関するデータを含む第1のキャッシュを検索するステップ、 前記第1のキャッシュから、前記第1のデータ型と前記第2のデータ型に共通する選択された属性範囲上の前記第1の測度に関するデータを含む第1のインデックスを生成するステップ、 前記第2のデータ型に対応し、前記選択された属性範囲上の前記第2の測度に関するデータを含む第2のキャッシュを検索するステップ、 前記第2のキャッシュから、前記第1のデータ型と前記第2のデータ型に共通する前記選択された属性範囲上の前記第2の測度に関するデータを含む第2のインデックスを生成するステップ、 前記第1のインデックスからの前記第1の測度に関するデータ及び前記第2のインデックスからの前記第2の測度に関するデータに対して算術演算を実行し、結果データを得るステップ、 前記第1のデータ型と前記第2のデータ型に共通する前記選択された属性範囲上に前記結果データを集めるステップ を実行するコンピュータ実行可能命令を備えることを特徴とする方法。

US8190610B2
CLAIM 41
. The computer system (可読媒体) of claim 40 , wherein the reducing is accomplished by iterating on at least one key from each of the first and second intermediate data set so that the output data set is a merging of a portion of the first and second intermediate data set .
JP2005018751A
CLAIM 14
選択された属性範囲上の測度表現式を計算するコンピュータ可読媒体 (computer system) であって、前記測度表現式は第1の測度と第2の測度との間の関係を含み、前記第1の測度は第1のデータ型に対応し、前記第2の測度は第2のデータ型に対応し、前記関係は算術演算によって定義され、前記コンピュータ可読媒体は、 前記第1のデータ型に対応し、選択された属性範囲上の第1の測度に関するデータを含む第1のキャッシュを検索するステップ、 前記第1のキャッシュから、前記第1のデータ型と前記第2のデータ型に共通する選択された属性範囲上の前記第1の測度に関するデータを含む第1のインデックスを生成するステップ、 前記第2のデータ型に対応し、前記選択された属性範囲上の前記第2の測度に関するデータを含む第2のキャッシュを検索するステップ、 前記第2のキャッシュから、前記第1のデータ型と前記第2のデータ型に共通する前記選択された属性範囲上の前記第2の測度に関するデータを含む第2のインデックスを生成するステップ、 前記第1のインデックスからの前記第1の測度に関するデータ及び前記第2のインデックスからの前記第2の測度に関するデータに対して算術演算を実行し、結果データを得るステップ、 前記第1のデータ型と前記第2のデータ型に共通する前記選択された属性範囲上に前記結果データを集めるステップ を実行するコンピュータ実行可能命令を備えることを特徴とする方法。

US8190610B2
CLAIM 42
. The computer system (可読媒体) of claim 40 , wherein the reducing is accomplished by a plurality of reducers , the d the at least one processor and memory further operable for : partitioning the first or second intermediate data set into a plurality of partitions , at least some of the key-value pairs of the first or second intermediate data being provided to more than one of the partitions ;

and providing the partitioned intermediate data set of each partition to a separate one of the reducers .
JP2005018751A
CLAIM 14
選択された属性範囲上の測度表現式を計算するコンピュータ可読媒体 (computer system) であって、前記測度表現式は第1の測度と第2の測度との間の関係を含み、前記第1の測度は第1のデータ型に対応し、前記第2の測度は第2のデータ型に対応し、前記関係は算術演算によって定義され、前記コンピュータ可読媒体は、 前記第1のデータ型に対応し、選択された属性範囲上の第1の測度に関するデータを含む第1のキャッシュを検索するステップ、 前記第1のキャッシュから、前記第1のデータ型と前記第2のデータ型に共通する選択された属性範囲上の前記第1の測度に関するデータを含む第1のインデックスを生成するステップ、 前記第2のデータ型に対応し、前記選択された属性範囲上の前記第2の測度に関するデータを含む第2のキャッシュを検索するステップ、 前記第2のキャッシュから、前記第1のデータ型と前記第2のデータ型に共通する前記選択された属性範囲上の前記第2の測度に関するデータを含む第2のインデックスを生成するステップ、 前記第1のインデックスからの前記第1の測度に関するデータ及び前記第2のインデックスからの前記第2の測度に関するデータに対して算術演算を実行し、結果データを得るステップ、 前記第1のデータ型と前記第2のデータ型に共通する前記選択された属性範囲上に前記結果データを集めるステップ を実行するコンピュータ実行可能命令を備えることを特徴とする方法。

US8190610B2
CLAIM 43
. The computer system (可読媒体) of claim 42 , wherein all of the key-value pairs of the first and second intermediate data set are provided to all of the reducers .
JP2005018751A
CLAIM 14
選択された属性範囲上の測度表現式を計算するコンピュータ可読媒体 (computer system) であって、前記測度表現式は第1の測度と第2の測度との間の関係を含み、前記第1の測度は第1のデータ型に対応し、前記第2の測度は第2のデータ型に対応し、前記関係は算術演算によって定義され、前記コンピュータ可読媒体は、 前記第1のデータ型に対応し、選択された属性範囲上の第1の測度に関するデータを含む第1のキャッシュを検索するステップ、 前記第1のキャッシュから、前記第1のデータ型と前記第2のデータ型に共通する選択された属性範囲上の前記第1の測度に関するデータを含む第1のインデックスを生成するステップ、 前記第2のデータ型に対応し、前記選択された属性範囲上の前記第2の測度に関するデータを含む第2のキャッシュを検索するステップ、 前記第2のキャッシュから、前記第1のデータ型と前記第2のデータ型に共通する前記選択された属性範囲上の前記第2の測度に関するデータを含む第2のインデックスを生成するステップ、 前記第1のインデックスからの前記第1の測度に関するデータ及び前記第2のインデックスからの前記第2の測度に関するデータに対して算術演算を実行し、結果データを得るステップ、 前記第1のデータ型と前記第2のデータ型に共通する前記選択された属性範囲上に前記結果データを集めるステップ を実行するコンピュータ実行可能命令を備えることを特徴とする方法。

US8190610B2
CLAIM 44
. The computer system (可読媒体) of claim 42 , wherein at least some of the reducers include a sort , group-by-key and combine task , the d the at least one processor and memory are further operable for : generating and providing metadata for at least some of the mapping , partitioning , combining , grouping and sorting .
JP2005018751A
CLAIM 14
選択された属性範囲上の測度表現式を計算するコンピュータ可読媒体 (computer system) であって、前記測度表現式は第1の測度と第2の測度との間の関係を含み、前記第1の測度は第1のデータ型に対応し、前記第2の測度は第2のデータ型に対応し、前記関係は算術演算によって定義され、前記コンピュータ可読媒体は、 前記第1のデータ型に対応し、選択された属性範囲上の第1の測度に関するデータを含む第1のキャッシュを検索するステップ、 前記第1のキャッシュから、前記第1のデータ型と前記第2のデータ型に共通する選択された属性範囲上の前記第1の測度に関するデータを含む第1のインデックスを生成するステップ、 前記第2のデータ型に対応し、前記選択された属性範囲上の前記第2の測度に関するデータを含む第2のキャッシュを検索するステップ、 前記第2のキャッシュから、前記第1のデータ型と前記第2のデータ型に共通する前記選択された属性範囲上の前記第2の測度に関するデータを含む第2のインデックスを生成するステップ、 前記第1のインデックスからの前記第1の測度に関するデータ及び前記第2のインデックスからの前記第2の測度に関するデータに対して算術演算を実行し、結果データを得るステップ、 前記第1のデータ型と前記第2のデータ型に共通する前記選択された属性範囲上に前記結果データを集めるステップ を実行するコンピュータ実行可能命令を備えることを特徴とする方法。

US8190610B2
CLAIM 45
. The computer system (可読媒体) of claim 44 , wherein the reducing includes processing the metadata .
JP2005018751A
CLAIM 14
選択された属性範囲上の測度表現式を計算するコンピュータ可読媒体 (computer system) であって、前記測度表現式は第1の測度と第2の測度との間の関係を含み、前記第1の測度は第1のデータ型に対応し、前記第2の測度は第2のデータ型に対応し、前記関係は算術演算によって定義され、前記コンピュータ可読媒体は、 前記第1のデータ型に対応し、選択された属性範囲上の第1の測度に関するデータを含む第1のキャッシュを検索するステップ、 前記第1のキャッシュから、前記第1のデータ型と前記第2のデータ型に共通する選択された属性範囲上の前記第1の測度に関するデータを含む第1のインデックスを生成するステップ、 前記第2のデータ型に対応し、前記選択された属性範囲上の前記第2の測度に関するデータを含む第2のキャッシュを検索するステップ、 前記第2のキャッシュから、前記第1のデータ型と前記第2のデータ型に共通する前記選択された属性範囲上の前記第2の測度に関するデータを含む第2のインデックスを生成するステップ、 前記第1のインデックスからの前記第1の測度に関するデータ及び前記第2のインデックスからの前記第2の測度に関するデータに対して算術演算を実行し、結果データを得るステップ、 前記第1のデータ型と前記第2のデータ型に共通する前記選択された属性範囲上に前記結果データを集めるステップ を実行するコンピュータ実行可能命令を備えることを特徴とする方法。

US8190610B2
CLAIM 46
. The computer system (可読媒体) of claim 45 , wherein iterating includes providing the associated metadata to the processing of the reducing step .
JP2005018751A
CLAIM 14
選択された属性範囲上の測度表現式を計算するコンピュータ可読媒体 (computer system) であって、前記測度表現式は第1の測度と第2の測度との間の関係を含み、前記第1の測度は第1のデータ型に対応し、前記第2の測度は第2のデータ型に対応し、前記関係は算術演算によって定義され、前記コンピュータ可読媒体は、 前記第1のデータ型に対応し、選択された属性範囲上の第1の測度に関するデータを含む第1のキャッシュを検索するステップ、 前記第1のキャッシュから、前記第1のデータ型と前記第2のデータ型に共通する選択された属性範囲上の前記第1の測度に関するデータを含む第1のインデックスを生成するステップ、 前記第2のデータ型に対応し、前記選択された属性範囲上の前記第2の測度に関するデータを含む第2のキャッシュを検索するステップ、 前記第2のキャッシュから、前記第1のデータ型と前記第2のデータ型に共通する前記選択された属性範囲上の前記第2の測度に関するデータを含む第2のインデックスを生成するステップ、 前記第1のインデックスからの前記第1の測度に関するデータ及び前記第2のインデックスからの前記第2の測度に関するデータに対して算術演算を実行し、結果データを得るステップ、 前記第1のデータ型と前記第2のデータ型に共通する前記選択された属性範囲上に前記結果データを集めるステップ を実行するコンピュータ実行可能命令を備えることを特徴とする方法。




US8190610B2

Filed: 2006-10-05     Issued: 2012-05-29

MapReduce for distributed database processing

(Original Assignee) Yahoo Inc     (Current Assignee) R2 Solutions LLC ; Altaba Inc

Ali Dasdan, Hung-Chih Yang, Ruey-Lung Hsiao
US20050283679A1

Filed: 2004-06-03     Issued: 2005-12-22

Method, system, and computer program product for dynamically managing power in microprocessor chips according to present processing demands

(Original Assignee) International Business Machines Corp     (Current Assignee) Microsoft Technology Licensing LLC

Thomas Heller, Michael Ignatowski, Bernard Meyerson, James Rymarczyk
US8190610B2
CLAIM 1
. A method of processing data of a data set (said system) over a distributed system , wherein the data set comprises a plurality of data groups , the method comprising : partitioning the data of each one of the data groups into a plurality of data partitions that each have a plurality of key-value pairs and providing each data partition to a selected one of a plurality of mapping functions that are each user-configurable to independently output a plurality of lists of values for each of a set of keys found in such map function' ;

s corresponding data partition to form corresponding intermediate data for that data group and identifiable to that data group , wherein the data of a first data group has a different schema than the data of a second data group and the data of the first data group is mapped differently than the data of the second data group so that different lists of values are output for the corresponding different intermediate data , wherein the different schema and corresponding different intermediate data have a key in common ;

and reducing the intermediate data for the data groups to at least one output data group , including processing the intermediate data for each data group in a manner that is defined to correspond to that data group , so as to result in a merging of the corresponding different intermediate data based on the key in common , wherein the mapping and reducing operations are performed by a distributed system .
US20050283679A1
CLAIM 19
. The method according to claim 5 , further comprising the steps of : determining whether any virtual processor is to be dispatched to a particular one of said plurality of physical processing cores ;
in response to a determination that no virtual processor is to be dispatched to said particular one of said plurality of physical processing cores , powering-off said particular one of said plurality of physical processing cores if said particular one of said plurality of physical processing cores is not the last powered-on one of said plurality of physical processing cores in said system (data set, first data set, second data set) .

US8190610B2
CLAIM 17
. A computer system including a plurality of computing devices , the computer system configured to process data of a data set (said system) , wherein the data set comprises a plurality of data groups , the computer system comprises at least one processor and memory that are operable to perform the following operations : partitioning the data of each one of the data groups into a plurality of data partitions that each have a plurality of key-value pairs and providing each data partition to a selected one of a plurality of mapping functions that are each user-configurable to independently output a plurality of lists of values for each of a set of keys found in such map function' ;

s corresponding data partition to form corresponding intermediate data for that data group and identifiable to that data group , wherein the data of a first data group has a different schema than the data of a second data group and the data of the first data group is mapped differently than the data of the second data group so that different lists of values are output for the corresponding different intermediate data , wherein the different schema and corresponding different intermediate data have a key in common ;

and reduce the intermediate data for the data groups to at least one output data group , including processing the intermediate data for each data group in a manner that is defined to correspond to that data group so as to result in a merging of the corresponding different intermediate data based on the key in common .
US20050283679A1
CLAIM 19
. The method according to claim 5 , further comprising the steps of : determining whether any virtual processor is to be dispatched to a particular one of said plurality of physical processing cores ;
in response to a determination that no virtual processor is to be dispatched to said particular one of said plurality of physical processing cores , powering-off said particular one of said plurality of physical processing cores if said particular one of said plurality of physical processing cores is not the last powered-on one of said plurality of physical processing cores in said system (data set, first data set, second data set) .

US8190610B2
CLAIM 33
. A map-reduce method of processing data from a plurality of groups having different schema over a computer system , the method comprising : for a first data set (said system) having a plurality of first key-value pairs , wherein such first data set belongs to a first data group and the first key-value pairs have a first schema , partitioning the first data set into a plurality of data partitions and providing each data partition to a selected one of a plurality of mapping functions that are each user-configurable to independently output a plurality of lists of values for each of a set of keys found in such map function' ;

s corresponding data partition to form a first intermediate data set having a first set of resulting key-value pairs ;

for a second data set having a plurality of second key-value pairs , wherein such second data set belongs to a second data group and the second key-value pairs have a second schema , partitioning the first data set into a plurality of data partitions and providing each data partition to a selected one of a plurality of mapping functions to form a second intermediate data set having a second set (including one, said server) of resulting key-value pairs differing from the first set of resulting key-value pairs , wherein the first schema differs from the second schema , wherein the first and second schema and the first and second set of resulting key-value pairs have a key in common ;

and reducing the first and the second intermediate data set together so as to form an output data set , wherein the reducing is accomplished by iterating on at least one key from the first and second intermediate data set , and the output data set has a different schema than the first and second schema , wherein the mapping and reducing operations are performed by a distributed system .
US20050283679A1
CLAIM 5
. A method in a symmetric multiprocessor server for dynamically managing power in a microprocessor chip included in said server (second set) , said microprocessor chip including a plurality of hardware elements , said plurality of hardware elements including a plurality of physical processing cores , said method comprising the steps of : selecting a virtual processor to be dispatched ;
identifying ones of said plurality of hardware elements , to which to dispatch said virtual processor , to use to execute said virtual processor , said ones of said plurality of hardware elements including one (second set) of said plurality of physical processing cores ;
and powering-on said identified ones of said plurality of hardware elements during said virtual processor being dispatched .

US20050283679A1
CLAIM 19
. The method according to claim 5 , further comprising the steps of : determining whether any virtual processor is to be dispatched to a particular one of said plurality of physical processing cores ;
in response to a determination that no virtual processor is to be dispatched to said particular one of said plurality of physical processing cores , powering-off said particular one of said plurality of physical processing cores if said particular one of said plurality of physical processing cores is not the last powered-on one of said plurality of physical processing cores in said system (data set, first data set, second data set) .

US8190610B2
CLAIM 34
. The map-reduce method of claim 33 , wherein the reducing is accomplished by iterating on at least one key from each of the first and second intermediate data set (said system) so that the output data set is a merging of a portion of the first and second intermediate data set .
US20050283679A1
CLAIM 19
. The method according to claim 5 , further comprising the steps of : determining whether any virtual processor is to be dispatched to a particular one of said plurality of physical processing cores ;
in response to a determination that no virtual processor is to be dispatched to said particular one of said plurality of physical processing cores , powering-off said particular one of said plurality of physical processing cores if said particular one of said plurality of physical processing cores is not the last powered-on one of said plurality of physical processing cores in said system (data set, first data set, second data set) .

US8190610B2
CLAIM 35
. The map-reduce method of claim 33 , wherein the reducing is accomplished by a plurality of reducers , the method further comprises : partitioning the first or second intermediate data set (said system) into a plurality of partitions , at least some of the key-value pairs of the first or second intermediate data being provided to more than one of the partitions ;

and providing the partitioned intermediate data set of each partition to a separate one of the reducers .
US20050283679A1
CLAIM 19
. The method according to claim 5 , further comprising the steps of : determining whether any virtual processor is to be dispatched to a particular one of said plurality of physical processing cores ;
in response to a determination that no virtual processor is to be dispatched to said particular one of said plurality of physical processing cores , powering-off said particular one of said plurality of physical processing cores if said particular one of said plurality of physical processing cores is not the last powered-on one of said plurality of physical processing cores in said system (data set, first data set, second data set) .

US8190610B2
CLAIM 36
. The map-reduce method of claim 35 , wherein all of the key-value pairs of the first and second intermediate data set (said system) are provided to all of the reducers .
US20050283679A1
CLAIM 19
. The method according to claim 5 , further comprising the steps of : determining whether any virtual processor is to be dispatched to a particular one of said plurality of physical processing cores ;
in response to a determination that no virtual processor is to be dispatched to said particular one of said plurality of physical processing cores , powering-off said particular one of said plurality of physical processing cores if said particular one of said plurality of physical processing cores is not the last powered-on one of said plurality of physical processing cores in said system (data set, first data set, second data set) .

US8190610B2
CLAIM 40
. A computer system including a plurality of computing devices , the computer system configured to process data of a data set (said system) , wherein the data set comprises a plurality of data groups , the computer system comprises at least one processor and memory that are operable to perform the following operations : for a first data set having a plurality of first key-value pairs , wherein such first data set belongs to a first data group and the first key-value pairs have a first schema , partitioning the first data set into a plurality of data partitions and providing each data partition to a selected one of a plurality of mapping functions that are each user-configurable to independently output a plurality of lists of values for each of a set of keys found in such map function' ;

s corresponding data partition to form a first intermediate data set having a first set of resulting key-value pairs ;

for a second data set having a plurality of second key-value pairs , wherein such second data set belongs to a second data group and the second key-value pairs have a second schema , partitioning the first data set into a plurality of data partitions and providing each data partition to a selected one of a plurality of mapping functions to form a second intermediate data set having a second set (including one, said server) of resulting key-value pairs differing from the first set of resulting key-value pairs , wherein the first schema differs from the second schema ;

and reducing the first and the second intermediate data set together so as to form an output data set , wherein the reducing is accomplished by iterating on at least one key from the first and second intermediate data set , and the output data set has a different schema than the first and second schema , wherein the first and second schema and the first and second set of resulting key-value pairs have a key in common .
US20050283679A1
CLAIM 5
. A method in a symmetric multiprocessor server for dynamically managing power in a microprocessor chip included in said server (second set) , said microprocessor chip including a plurality of hardware elements , said plurality of hardware elements including a plurality of physical processing cores , said method comprising the steps of : selecting a virtual processor to be dispatched ;
identifying ones of said plurality of hardware elements , to which to dispatch said virtual processor , to use to execute said virtual processor , said ones of said plurality of hardware elements including one (second set) of said plurality of physical processing cores ;
and powering-on said identified ones of said plurality of hardware elements during said virtual processor being dispatched .

US20050283679A1
CLAIM 19
. The method according to claim 5 , further comprising the steps of : determining whether any virtual processor is to be dispatched to a particular one of said plurality of physical processing cores ;
in response to a determination that no virtual processor is to be dispatched to said particular one of said plurality of physical processing cores , powering-off said particular one of said plurality of physical processing cores if said particular one of said plurality of physical processing cores is not the last powered-on one of said plurality of physical processing cores in said system (data set, first data set, second data set) .

US8190610B2
CLAIM 41
. The computer system of claim 40 , wherein the reducing is accomplished by iterating on at least one key from each of the first and second intermediate data set (said system) so that the output data set is a merging of a portion of the first and second intermediate data set .
US20050283679A1
CLAIM 19
. The method according to claim 5 , further comprising the steps of : determining whether any virtual processor is to be dispatched to a particular one of said plurality of physical processing cores ;
in response to a determination that no virtual processor is to be dispatched to said particular one of said plurality of physical processing cores , powering-off said particular one of said plurality of physical processing cores if said particular one of said plurality of physical processing cores is not the last powered-on one of said plurality of physical processing cores in said system (data set, first data set, second data set) .

US8190610B2
CLAIM 42
. The computer system of claim 40 , wherein the reducing is accomplished by a plurality of reducers , the d the at least one processor and memory further operable for : partitioning the first or second intermediate data set (said system) into a plurality of partitions , at least some of the key-value pairs of the first or second intermediate data being provided to more than one of the partitions ;

and providing the partitioned intermediate data set of each partition to a separate one of the reducers .
US20050283679A1
CLAIM 19
. The method according to claim 5 , further comprising the steps of : determining whether any virtual processor is to be dispatched to a particular one of said plurality of physical processing cores ;
in response to a determination that no virtual processor is to be dispatched to said particular one of said plurality of physical processing cores , powering-off said particular one of said plurality of physical processing cores if said particular one of said plurality of physical processing cores is not the last powered-on one of said plurality of physical processing cores in said system (data set, first data set, second data set) .

US8190610B2
CLAIM 43
. The computer system of claim 42 , wherein all of the key-value pairs of the first and second intermediate data set (said system) are provided to all of the reducers .
US20050283679A1
CLAIM 19
. The method according to claim 5 , further comprising the steps of : determining whether any virtual processor is to be dispatched to a particular one of said plurality of physical processing cores ;
in response to a determination that no virtual processor is to be dispatched to said particular one of said plurality of physical processing cores , powering-off said particular one of said plurality of physical processing cores if said particular one of said plurality of physical processing cores is not the last powered-on one of said plurality of physical processing cores in said system (data set, first data set, second data set) .




US8190610B2

Filed: 2006-10-05     Issued: 2012-05-29

MapReduce for distributed database processing

(Original Assignee) Yahoo Inc     (Current Assignee) R2 Solutions LLC ; Altaba Inc

Ali Dasdan, Hung-Chih Yang, Ruey-Lung Hsiao
US20040215640A1

Filed: 2004-04-23     Issued: 2004-10-28

Parallel recovery by non-failed nodes

(Original Assignee) Oracle International Corp     (Current Assignee) Oracle International Corp

Roger Bamford, Sashikanth Chandrasekaran, Angelo Pruscino
US8190610B2
CLAIM 12
. The method of claim 5 , wherein : the intermediate data includes a plurality of grouped sets of key/value pairs ;

the reducing step is carried out by a plurality of reducers ;

for at least one of the reducers , the iterator corresponding to a particular data (particular data) group , for that reducer , operates according to a different key of a different schema than the iterator corresponding to another particular data group , for that reducer .
US20040215640A1
CLAIM 1
. A method for managing data , the method comprising the steps of : maintaining a plurality of persistent data items on persistent storage accessible to a plurality of nodes , the persistent data items including a particular data (particular data) item stored at a particular location on said persistent storage ;
assigning exclusive ownership of each of the persistent data items to one of the nodes , wherein a particular node of said plurality of nodes is assigned exclusive ownership of said particular data item ;
when any node wants an operation to be performed that involves said particular data item , the node that desires the operation to be performed ships the operation to the particular node for the particular node to perform the operation on the particular data item as said particular data item is exclusively owned by said particular node ;
in response to a failure that involves a set of persistent data items exclusively owned by a single node , performing the steps of : assigning , to each of two or more recovery nodes , exclusive ownership of a subset of the set of persistent data items that were involved in the failure ;
and each recovery node of the two or more recovery nodes performing a recovery operation on the subset of persistent data items that were assigned to the recovery node .

US8190610B2
CLAIM 22
. The computer system of claim 21 , wherein : the intermediate data includes a plurality of grouped sets of key/value pairs ;

the at least one processor and memory are further operable to reduce the intermediate data via a plurality of reducers ;

for at least one of the reducers , the iterator corresponding to a particular data (particular data) group , for that reducer , is configured to operate according to a different key of a different schema than the iterator corresponding to another particular data group , for that reducer .
US20040215640A1
CLAIM 1
. A method for managing data , the method comprising the steps of : maintaining a plurality of persistent data items on persistent storage accessible to a plurality of nodes , the persistent data items including a particular data (particular data) item stored at a particular location on said persistent storage ;
assigning exclusive ownership of each of the persistent data items to one of the nodes , wherein a particular node of said plurality of nodes is assigned exclusive ownership of said particular data item ;
when any node wants an operation to be performed that involves said particular data item , the node that desires the operation to be performed ships the operation to the particular node for the particular node to perform the operation on the particular data item as said particular data item is exclusively owned by said particular node ;
in response to a failure that involves a set of persistent data items exclusively owned by a single node , performing the steps of : assigning , to each of two or more recovery nodes , exclusive ownership of a subset of the set of persistent data items that were involved in the failure ;
and each recovery node of the two or more recovery nodes performing a recovery operation on the subset of persistent data items that were assigned to the recovery node .

US8190610B2
CLAIM 33
. A map-reduce method of processing data from a plurality of groups having different schema over a computer system , the method comprising : for a first data set having a plurality of first key-value pairs , wherein such first data set belongs to a first data group and the first key-value pairs have a first schema , partitioning the first data set into a plurality of data partitions and providing each data partition to a selected one of a plurality of mapping functions that are each user-configurable to independently output a plurality of lists of values for each of a set of keys found in such map function' ;

s corresponding data partition to form a first intermediate data set having a first set (first set) of resulting key-value pairs ;

for a second data set having a plurality of second key-value pairs , wherein such second data set belongs to a second data group and the second key-value pairs have a second schema , partitioning the first data set into a plurality of data partitions and providing each data partition to a selected one of a plurality of mapping functions to form a second intermediate data set having a second set (second set) of resulting key-value pairs differing from the first set of resulting key-value pairs , wherein the first schema differs from the second schema , wherein the first and second schema and the first and second set of resulting key-value pairs have a key in common ;

and reducing the first and the second intermediate data set together so as to form an output data set , wherein the reducing is accomplished by iterating on at least one key from the first and second intermediate data set , and the output data set has a different schema than the first and second schema , wherein the mapping and reducing operations are performed by a distributed system .
US20040215640A1
CLAIM 12
. The method of claim 3 wherein recovery of the failed node involves various tasks , the method further comprising the steps of : a recovery coordinator determining that a first set (first set) of one or more tasks required for recovery of said failed node should be performed serially , and that a second set (second set) of one or more tasks required for recovery of said failed node should be performed in parallel ;
and performing the first set of one or more tasks serially ;
and using said two or more recovery nodes to perform said second set of one or more tasks in parallel .

US8190610B2
CLAIM 40
. A computer system including a plurality of computing devices , the computer system configured to process data of a data set , wherein the data set comprises a plurality of data groups , the computer system comprises at least one processor and memory that are operable to perform the following operations : for a first data set having a plurality of first key-value pairs , wherein such first data set belongs to a first data group and the first key-value pairs have a first schema , partitioning the first data set into a plurality of data partitions and providing each data partition to a selected one of a plurality of mapping functions that are each user-configurable to independently output a plurality of lists of values for each of a set of keys found in such map function' ;

s corresponding data partition to form a first intermediate data set having a first set (first set) of resulting key-value pairs ;

for a second data set having a plurality of second key-value pairs , wherein such second data set belongs to a second data group and the second key-value pairs have a second schema , partitioning the first data set into a plurality of data partitions and providing each data partition to a selected one of a plurality of mapping functions to form a second intermediate data set having a second set (second set) of resulting key-value pairs differing from the first set of resulting key-value pairs , wherein the first schema differs from the second schema ;

and reducing the first and the second intermediate data set together so as to form an output data set , wherein the reducing is accomplished by iterating on at least one key from the first and second intermediate data set , and the output data set has a different schema than the first and second schema , wherein the first and second schema and the first and second set of resulting key-value pairs have a key in common .
US20040215640A1
CLAIM 12
. The method of claim 3 wherein recovery of the failed node involves various tasks , the method further comprising the steps of : a recovery coordinator determining that a first set (first set) of one or more tasks required for recovery of said failed node should be performed serially , and that a second set (second set) of one or more tasks required for recovery of said failed node should be performed in parallel ;
and performing the first set of one or more tasks serially ;
and using said two or more recovery nodes to perform said second set of one or more tasks in parallel .




US8190610B2

Filed: 2006-10-05     Issued: 2012-05-29

MapReduce for distributed database processing

(Original Assignee) Yahoo Inc     (Current Assignee) R2 Solutions LLC ; Altaba Inc

Ali Dasdan, Hung-Chih Yang, Ruey-Lung Hsiao
CN1778089A

Filed: 2004-04-22     Issued: 2006-05-24

内容的对等传输

(Original Assignee) Koninklijke Philips NV     (Current Assignee) Koninklijke Philips NV

W·F·J·方蒂恩, N·兰伯特
US8190610B2
CLAIM 1
. A method of processing data (包括程序) of a data set over a distributed system , wherein the data set comprises a plurality of data groups , the method comprising : partitioning the data of each one of the data groups into a plurality of data partitions that each have a plurality of key-value pairs and providing each data partition to a selected one of a plurality of mapping functions that are each user-configurable to independently output a plurality of lists of values for each of a set of keys found in such map function' ;

s corresponding data partition to form corresponding intermediate data for that data group (包括程序) and identifiable to that data group , wherein the data of a first data group has a different schema than the data of a second data group and the data of the first data group is mapped differently than the data of the second data group so that different lists of values are output for the corresponding different intermediate data , wherein the different schema and corresponding different intermediate data have a key in common ;

and reducing the intermediate data for the data groups to at least one output data group , including processing the intermediate data for each data group in a manner that is defined to correspond to that data group , so as to result in a merging of the corresponding different intermediate data based on the key in common , wherein the mapping and reducing operations are performed by a distributed system .
CN1778089A
CLAIM 13
. 一种存储在计算机可读介质上的包括程序 (data group, second data group, particular data group, processing data) 模式装置的计算机程序产品,其用于当在计算机上运行计算机程序时执行权利要求1~5中的任何一个的方法。

US8190610B2
CLAIM 2
. The method of claim 1 , wherein : the at least one output data group (包括程序) is a plurality of output data groups .
CN1778089A
CLAIM 13
. 一种存储在计算机可读介质上的包括程序 (data group, second data group, particular data group, processing data) 模式装置的计算机程序产品,其用于当在计算机上运行计算机程序时执行权利要求1~5中的任何一个的方法。

US8190610B2
CLAIM 3
. The method of claim 1 , wherein : corresponding intermediate data for a data group (包括程序) being identifiable to the data group includes the corresponding intermediate data being stored such that metadata associated with a manner of storage is identifiable to the data group .
CN1778089A
CLAIM 13
. 一种存储在计算机可读介质上的包括程序 (data group, second data group, particular data group, processing data) 模式装置的计算机程序产品,其用于当在计算机上运行计算机程序时执行权利要求1~5中的任何一个的方法。

US8190610B2
CLAIM 4
. The method of claim 1 , wherein : corresponding intermediate data for a data group (包括程序) being identifiable to the data group includes the corresponding intermediate data being stored in association with data that is identifiable to the data group .
CN1778089A
CLAIM 13
. 一种存储在计算机可读介质上的包括程序 (data group, second data group, particular data group, processing data) 模式装置的计算机程序产品,其用于当在计算机上运行计算机程序时执行权利要求1~5中的任何一个的方法。

US8190610B2
CLAIM 5
. The method of claim 1 , wherein : processing the intermediate data for each data group (包括程序) in a manner that is defined to correspond to that data group includes , for each data group , employing an iterator that corresponds to that data group .
CN1778089A
CLAIM 13
. 一种存储在计算机可读介质上的包括程序 (data group, second data group, particular data group, processing data) 模式装置的计算机程序产品,其用于当在计算机上运行计算机程序时执行权利要求1~5中的任何一个的方法。

US8190610B2
CLAIM 8
. The method of claim 7 , wherein : in the partitioning step (进一步) , all of the key/value pairs of the intermediate data are provided to all of the partitions .
CN1778089A
CLAIM 2
. 根据权利要求1的方法,所述方法进一步 (partitioning step) 包括如下步骤:将满足第二选择准则的第二内容和第二选择准则从第四装置上载(700)到服务器;和因第四装置将第二内容和第二准则上载到服务器而对其奖励(800)。

US8190610B2
CLAIM 11
. The method of claim 10 , wherein : processing the intermediate data for each data group (包括程序) in a manner that is defined to correspond to that data group includes , for each data group , employing an iterator that corresponds to that data group , wherein the iterator includes providing the associated metadata to the processing of the reducing step .
CN1778089A
CLAIM 13
. 一种存储在计算机可读介质上的包括程序 (data group, second data group, particular data group, processing data) 模式装置的计算机程序产品,其用于当在计算机上运行计算机程序时执行权利要求1~5中的任何一个的方法。

US8190610B2
CLAIM 12
. The method of claim 5 , wherein : the intermediate data includes a plurality of grouped sets of key/value pairs ;

the reducing step is carried out by a plurality of reducers ;

for at least one of the reducers , the iterator corresponding to a particular data group (包括程序) , for that reducer , operates according to a different key of a different schema than the iterator corresponding to another particular data group , for that reducer .
CN1778089A
CLAIM 13
. 一种存储在计算机可读介质上的包括程序 (data group, second data group, particular data group, processing data) 模式装置的计算机程序产品,其用于当在计算机上运行计算机程序时执行权利要求1~5中的任何一个的方法。

US8190610B2
CLAIM 13
. The method of claim 1 , wherein : the intermediate data processing step of the reducing step further comprises processing data (包括程序) that is not intermediate data .
CN1778089A
CLAIM 13
. 一种存储在计算机可读介质上的包括程序 (data group, second data group, particular data group, processing data) 模式装置的计算机程序产品,其用于当在计算机上运行计算机程序时执行权利要求1~5中的任何一个的方法。

US8190610B2
CLAIM 17
. A computer system including a plurality of computing devices (个人计算, 计算机程) , the computer system configured to process data of a data set , wherein the data set comprises a plurality of data groups , the computer system comprises at least one processor and memory that are operable to perform the following operations : partitioning the data of each one of the data groups into a plurality of data partitions that each have a plurality of key-value pairs and providing each data partition to a selected one of a plurality of mapping functions that are each user-configurable to independently output a plurality of lists of values for each of a set of keys found in such map function' ;

s corresponding data partition to form corresponding intermediate data for that data group (包括程序) and identifiable to that data group , wherein the data of a first data group has a different schema than the data of a second data group and the data of the first data group is mapped differently than the data of the second data group so that different lists of values are output for the corresponding different intermediate data , wherein the different schema and corresponding different intermediate data have a key in common ;

and reduce the intermediate data for the data groups to at least one output data group , including processing the intermediate data for each data group in a manner that is defined to correspond to that data group so as to result in a merging of the corresponding different intermediate data based on the key in common .
CN1778089A
CLAIM 5
. 根据权利要求1~4中的任何一个的方法,其特征在于所述装置的任何一个可以是盒式录像机(VCR)、个人数字助理(PDA)、移动电话、电视、无线电、DVD播放器、CD播放器、信息板、网络手写板、智能远程、同级或个人计算 (computing devices) 机。

CN1778089A
CLAIM 13
. 一种存储在计算机可读介质上的包括程序 (data group, second data group, particular data group, processing data) 模式装置的计算机程 (computing devices) 序产品,其用于当在计算机上运行计算机程序时执行权利要求1~5中的任何一个的方法。

US8190610B2
CLAIM 18
. The computer system of claim 17 , wherein : the at least one output data group (包括程序) is a plurality of output data groups .
CN1778089A
CLAIM 13
. 一种存储在计算机可读介质上的包括程序 (data group, second data group, particular data group, processing data) 模式装置的计算机程序产品,其用于当在计算机上运行计算机程序时执行权利要求1~5中的任何一个的方法。

US8190610B2
CLAIM 19
. The computer system of claim 17 , wherein : corresponding intermediate data for a data group (包括程序) being identifiable to the data group includes the corresponding intermediate data being stored such that metadata associated with the manner of storage is identifiable to the data group .
CN1778089A
CLAIM 13
. 一种存储在计算机可读介质上的包括程序 (data group, second data group, particular data group, processing data) 模式装置的计算机程序产品,其用于当在计算机上运行计算机程序时执行权利要求1~5中的任何一个的方法。

US8190610B2
CLAIM 20
. The computer system of claim 19 , wherein : corresponding intermediate data for a data group (包括程序) being identifiable to the data group includes the computer system being configured to store the corresponding intermediate data in association with data that is identifiable to the data group .
CN1778089A
CLAIM 13
. 一种存储在计算机可读介质上的包括程序 (data group, second data group, particular data group, processing data) 模式装置的计算机程序产品,其用于当在计算机上运行计算机程序时执行权利要求1~5中的任何一个的方法。

US8190610B2
CLAIM 21
. The computer system of claim 17 , wherein : the at least one processor and memory being configured to process the intermediate data for each data group (包括程序) in a manner that is defined to correspond to that data group includes , for each data group , the at least one processor and memory being configured to employ an iterator that corresponds to that data group .
CN1778089A
CLAIM 13
. 一种存储在计算机可读介质上的包括程序 (data group, second data group, particular data group, processing data) 模式装置的计算机程序产品,其用于当在计算机上运行计算机程序时执行权利要求1~5中的任何一个的方法。

US8190610B2
CLAIM 22
. The computer system of claim 21 , wherein : the intermediate data includes a plurality of grouped sets of key/value pairs ;

the at least one processor and memory are further operable to reduce the intermediate data via a plurality of reducers ;

for at least one of the reducers , the iterator corresponding to a particular data group (包括程序) , for that reducer , is configured to operate according to a different key of a different schema than the iterator corresponding to another particular data group , for that reducer .
CN1778089A
CLAIM 13
. 一种存储在计算机可读介质上的包括程序 (data group, second data group, particular data group, processing data) 模式装置的计算机程序产品,其用于当在计算机上运行计算机程序时执行权利要求1~5中的任何一个的方法。

US8190610B2
CLAIM 28
. The computer system of claim 27 , wherein : the at least one processor and memory being further operable to process the intermediate data for each data group (包括程序) in a manner that is defined to correspond to that data group includes , for each data group , the at least one processor and memory being further operable to employ an iterator that corresponds to that data group , wherein the iterator is configured to provide the associated metadata to the reducing .
CN1778089A
CLAIM 13
. 一种存储在计算机可读介质上的包括程序 (data group, second data group, particular data group, processing data) 模式装置的计算机程序产品,其用于当在计算机上运行计算机程序时执行权利要求1~5中的任何一个的方法。

US8190610B2
CLAIM 29
. The computer system of claim 17 , wherein : the intermediate data processing of the reducing further comprises processing data (包括程序) that is not intermediate data .
CN1778089A
CLAIM 13
. 一种存储在计算机可读介质上的包括程序 (data group, second data group, particular data group, processing data) 模式装置的计算机程序产品,其用于当在计算机上运行计算机程序时执行权利要求1~5中的任何一个的方法。

US8190610B2
CLAIM 33
. A map-reduce method of processing data (包括程序) from a plurality of groups having different schema over a computer system , the method comprising : for a first data set having a plurality of first key-value pairs , wherein such first data set belongs to a first data group (包括程序) and the first key-value pairs have a first schema , partitioning the first data set into a plurality of data partitions and providing each data partition to a selected one of a plurality of mapping functions that are each user-configurable to independently output a plurality of lists of values for each of a set of keys found in such map function' ;

s corresponding data partition to form a first intermediate data set having a first set of resulting key-value pairs ;

for a second data set having a plurality of second key-value pairs , wherein such second data set belongs to a second data group and the second key-value pairs have a second schema , partitioning the first data set into a plurality of data partitions and providing each data partition to a selected one of a plurality of mapping functions to form a second intermediate data set having a second set of resulting key-value pairs differing from the first set of resulting key-value pairs , wherein the first schema differs from the second schema , wherein the first and second schema and the first and second set of resulting key-value pairs have a key in common ;

and reducing the first and the second intermediate data set together so as to form an output data set , wherein the reducing is accomplished by iterating on at least one key from the first and second intermediate data set , and the output data set has a different schema than the first and second schema , wherein the mapping and reducing operations are performed by a distributed system .
CN1778089A
CLAIM 13
. 一种存储在计算机可读介质上的包括程序 (data group, second data group, particular data group, processing data) 模式装置的计算机程序产品,其用于当在计算机上运行计算机程序时执行权利要求1~5中的任何一个的方法。

US8190610B2
CLAIM 40
. A computer system including a plurality of computing devices (个人计算, 计算机程) , the computer system configured to process data of a data set , wherein the data set comprises a plurality of data groups , the computer system comprises at least one processor and memory that are operable to perform the following operations : for a first data set having a plurality of first key-value pairs , wherein such first data set belongs to a first data group (包括程序) and the first key-value pairs have a first schema , partitioning the first data set into a plurality of data partitions and providing each data partition to a selected one of a plurality of mapping functions that are each user-configurable to independently output a plurality of lists of values for each of a set of keys found in such map function' ;

s corresponding data partition to form a first intermediate data set having a first set of resulting key-value pairs ;

for a second data set having a plurality of second key-value pairs , wherein such second data set belongs to a second data group and the second key-value pairs have a second schema , partitioning the first data set into a plurality of data partitions and providing each data partition to a selected one of a plurality of mapping functions to form a second intermediate data set having a second set of resulting key-value pairs differing from the first set of resulting key-value pairs , wherein the first schema differs from the second schema ;

and reducing the first and the second intermediate data set together so as to form an output data set , wherein the reducing is accomplished by iterating on at least one key from the first and second intermediate data set , and the output data set has a different schema than the first and second schema , wherein the first and second schema and the first and second set of resulting key-value pairs have a key in common .
CN1778089A
CLAIM 5
. 根据权利要求1~4中的任何一个的方法,其特征在于所述装置的任何一个可以是盒式录像机(VCR)、个人数字助理(PDA)、移动电话、电视、无线电、DVD播放器、CD播放器、信息板、网络手写板、智能远程、同级或个人计算 (computing devices) 机。

CN1778089A
CLAIM 13
. 一种存储在计算机可读介质上的包括程序 (data group, second data group, particular data group, processing data) 模式装置的计算机程 (computing devices) 序产品,其用于当在计算机上运行计算机程序时执行权利要求1~5中的任何一个的方法。




US8190610B2

Filed: 2006-10-05     Issued: 2012-05-29

MapReduce for distributed database processing

(Original Assignee) Yahoo Inc     (Current Assignee) R2 Solutions LLC ; Altaba Inc

Ali Dasdan, Hung-Chih Yang, Ruey-Lung Hsiao
CN1781105A

Filed: 2004-03-31     Issued: 2006-05-31

在xml文档和关系数据之间的映射中保留层次信息

(Original Assignee) Oracle International Corp     (Current Assignee) Oracle International Corp ; Oracle America Inc

拉维·默西, 穆拉利达尔·克里希纳普拉萨德, 阿南德·马尼库蒂, 刘贞, 詹姆士·沃纳
US8190610B2
CLAIM 8
. The method of claim 7 , wherein : in the partitioning step (进一步) , all of the key/value pairs of the intermediate data are provided to all of the partitions .
CN1781105A
CLAIM 4
. 根据权利要求2所述的方法,所述产生所述有序集合的步骤进一步 (partitioning step) 包括以下步骤:在XML文档顺序中紧接在所述XML树形层次的第一节点之后的所述XML树形层次的下一节点处,从所述第一组接收当前XML结构;产生当前数据项以代表所述当前XML结构;以及在所述有序集合的特殊入口中,将当前等级与所述当前数据项相关联。

US8190610B2
CLAIM 17
. A computer system including a plurality of computing devices (来计算) , the computer system configured to process data of a data set , wherein the data set comprises a plurality of data groups , the computer system comprises at least one processor and memory that are operable to perform the following operations : partitioning the data of each one of the data groups into a plurality of data partitions that each have a plurality of key-value pairs and providing each data partition to a selected one of a plurality of mapping functions that are each user-configurable to independently output a plurality of lists of values for each of a set of keys found in such map function' ;

s corresponding data partition to form corresponding intermediate data for that data group and identifiable to that data group , wherein the data of a first data group has a different schema than the data of a second data group and the data of the first data group is mapped differently than the data of the second data group so that different lists of values are output for the corresponding different intermediate data , wherein the different schema and corresponding different intermediate data have a key in common ;

and reduce the intermediate data for the data groups to at least one output data group , including processing the intermediate data for each data group in a manner that is defined to correspond to that data group so as to result in a merging of the corresponding different intermediate data based on the key in common .
CN1781105A
CLAIM 1
. 一种用于在允许数据项代表可扩展标记语言(XML)结构的结构化查询语言(SQL)兼容数据库管理系统(DBMS)中,在XML结构和SQL结构之间进行数据转换的方法,包括以下步骤:接收包括特殊运算符的SQL语句,所述特殊运算符对代表第一组零个或多个XML元素的第一数据项生效;以及在执行所述SQL语句期间,通过产生零个或多个数据项的有序集合来计算 (computing devices) 所述特殊运算符,其中,所述有序集合中的每个不同数据项是基于来自所述第一组的不同XML元素;以及对于在所述第一组中的每个XML元素,存在所述有序集合中的数据项。

US8190610B2
CLAIM 40
. A computer system including a plurality of computing devices (来计算) , the computer system configured to process data of a data set , wherein the data set comprises a plurality of data groups , the computer system comprises at least one processor and memory that are operable to perform the following operations : for a first data set having a plurality of first key-value pairs , wherein such first data set belongs to a first data group and the first key-value pairs have a first schema , partitioning the first data set into a plurality of data partitions and providing each data partition to a selected one of a plurality of mapping functions that are each user-configurable to independently output a plurality of lists of values for each of a set of keys found in such map function' ;

s corresponding data partition to form a first intermediate data set having a first set of resulting key-value pairs ;

for a second data set having a plurality of second key-value pairs , wherein such second data set belongs to a second data group and the second key-value pairs have a second schema , partitioning the first data set into a plurality of data partitions and providing each data partition to a selected one of a plurality of mapping functions to form a second intermediate data set having a second set of resulting key-value pairs differing from the first set of resulting key-value pairs , wherein the first schema differs from the second schema ;

and reducing the first and the second intermediate data set together so as to form an output data set , wherein the reducing is accomplished by iterating on at least one key from the first and second intermediate data set , and the output data set has a different schema than the first and second schema , wherein the first and second schema and the first and second set of resulting key-value pairs have a key in common .
CN1781105A
CLAIM 1
. 一种用于在允许数据项代表可扩展标记语言(XML)结构的结构化查询语言(SQL)兼容数据库管理系统(DBMS)中,在XML结构和SQL结构之间进行数据转换的方法,包括以下步骤:接收包括特殊运算符的SQL语句,所述特殊运算符对代表第一组零个或多个XML元素的第一数据项生效;以及在执行所述SQL语句期间,通过产生零个或多个数据项的有序集合来计算 (computing devices) 所述特殊运算符,其中,所述有序集合中的每个不同数据项是基于来自所述第一组的不同XML元素;以及对于在所述第一组中的每个XML元素,存在所述有序集合中的数据项。




US8190610B2

Filed: 2006-10-05     Issued: 2012-05-29

MapReduce for distributed database processing

(Original Assignee) Yahoo Inc     (Current Assignee) R2 Solutions LLC ; Altaba Inc

Ali Dasdan, Hung-Chih Yang, Ruey-Lung Hsiao
CN1534518A

Filed: 2004-03-26     Issued: 2004-10-06

在应用定义的系统中一致性单元的复制

(Original Assignee) Microsoft Corp     (Current Assignee) Microsoft Corp

C・纳拉亚南, C·纳拉亚南, 辛格, R·P·辛格, 帕勒姆, J·B·帕勒姆
US8190610B2
CLAIM 1
. A method of processing data of a data set over a distributed system , wherein the data set comprises a plurality of data groups , the method comprising : partitioning the data of each one of the data groups into a plurality of data partitions that each have a plurality of key-value pairs and providing each data partition to a selected one of a plurality of mapping functions that are each user-configurable to independently output a plurality of lists of values for each of a set of keys found in such map function' ;

s corresponding data partition to form corresponding intermediate data for that data group and identifiable to that data group , wherein the data of a first data (通过下列) group has a different schema than the data of a second data group and the data of the first data group is mapped differently than the data of the second data group so that different lists of values are output for the corresponding different intermediate data , wherein the different schema and corresponding different intermediate data have a key in common ;

and reducing the intermediate data for the data groups to at least one output data group , including processing the intermediate data for each data group in a manner that is defined to correspond to that data group , so as to result in a merging of the corresponding different intermediate data based on the key in common , wherein the mapping and reducing operations are performed by a distributed system .
CN1534518A
CLAIM 24
. 如权利要求23的方法,其特征在于通过下列 (first data) 步骤在逻辑记录层上解决冲突:比较在一致性单元的逻辑记录层的逻辑记录系属元数据与复制版本的逻辑记录层的逻辑记录系属元数据;和按预定的冲突策略选择胜出的逻辑记录系属元数据。

US8190610B2
CLAIM 17
. A computer system including a plurality of computing devices (一种计算) , the computer system configured to process data of a data set , wherein the data set comprises a plurality of data groups , the computer system comprises at least one processor and memory that are operable to perform the following operations : partitioning the data of each one of the data groups into a plurality of data partitions that each have a plurality of key-value pairs and providing each data partition to a selected one of a plurality of mapping functions that are each user-configurable to independently output a plurality of lists of values for each of a set of keys found in such map function' ;

s corresponding data partition to form corresponding intermediate data for that data group and identifiable to that data group , wherein the data of a first data (通过下列) group has a different schema than the data of a second data group and the data of the first data group is mapped differently than the data of the second data group so that different lists of values are output for the corresponding different intermediate data , wherein the different schema and corresponding different intermediate data have a key in common ;

and reduce the intermediate data for the data groups to at least one output data group , including processing the intermediate data for each data group in a manner that is defined to correspond to that data group so as to result in a merging of the corresponding different intermediate data based on the key in common .
CN1534518A
CLAIM 18
. 一种计算 (computing devices) 机,包括权利要求1的系统。

CN1534518A
CLAIM 24
. 如权利要求23的方法,其特征在于通过下列 (first data) 步骤在逻辑记录层上解决冲突:比较在一致性单元的逻辑记录层的逻辑记录系属元数据与复制版本的逻辑记录层的逻辑记录系属元数据;和按预定的冲突策略选择胜出的逻辑记录系属元数据。

US8190610B2
CLAIM 33
. A map-reduce method of processing data from a plurality of groups having different schema over a computer system , the method comprising : for a first data (通过下列) set having a plurality of first key-value pairs , wherein such first data set belongs to a first data group and the first key-value pairs have a first schema , partitioning the first data set into a plurality of data partitions and providing each data partition to a selected one of a plurality of mapping functions that are each user-configurable to independently output a plurality of lists of values for each of a set of keys found in such map function' ;

s corresponding data partition to form a first intermediate data set having a first set of resulting key-value pairs ;

for a second data set having a plurality of second key-value pairs , wherein such second data set belongs to a second data group and the second key-value pairs have a second schema , partitioning the first data set into a plurality of data partitions and providing each data partition to a selected one of a plurality of mapping functions to form a second intermediate data set having a second set of resulting key-value pairs differing from the first set of resulting key-value pairs , wherein the first schema differs from the second schema , wherein the first and second schema and the first and second set of resulting key-value pairs have a key in common ;

and reducing the first and the second intermediate data set together so as to form an output data set , wherein the reducing is accomplished by iterating on at least one key from the first and second intermediate data set , and the output data set has a different schema than the first and second schema , wherein the mapping and reducing operations are performed by a distributed system .
CN1534518A
CLAIM 24
. 如权利要求23的方法,其特征在于通过下列 (first data) 步骤在逻辑记录层上解决冲突:比较在一致性单元的逻辑记录层的逻辑记录系属元数据与复制版本的逻辑记录层的逻辑记录系属元数据;和按预定的冲突策略选择胜出的逻辑记录系属元数据。

US8190610B2
CLAIM 40
. A computer system including a plurality of computing devices (一种计算) , the computer system configured to process data of a data set , wherein the data set comprises a plurality of data groups , the computer system comprises at least one processor and memory that are operable to perform the following operations : for a first data (通过下列) set having a plurality of first key-value pairs , wherein such first data set belongs to a first data group and the first key-value pairs have a first schema , partitioning the first data set into a plurality of data partitions and providing each data partition to a selected one of a plurality of mapping functions that are each user-configurable to independently output a plurality of lists of values for each of a set of keys found in such map function' ;

s corresponding data partition to form a first intermediate data set having a first set of resulting key-value pairs ;

for a second data set having a plurality of second key-value pairs , wherein such second data set belongs to a second data group and the second key-value pairs have a second schema , partitioning the first data set into a plurality of data partitions and providing each data partition to a selected one of a plurality of mapping functions to form a second intermediate data set having a second set of resulting key-value pairs differing from the first set of resulting key-value pairs , wherein the first schema differs from the second schema ;

and reducing the first and the second intermediate data set together so as to form an output data set , wherein the reducing is accomplished by iterating on at least one key from the first and second intermediate data set , and the output data set has a different schema than the first and second schema , wherein the first and second schema and the first and second set of resulting key-value pairs have a key in common .
CN1534518A
CLAIM 18
. 一种计算 (computing devices) 机,包括权利要求1的系统。

CN1534518A
CLAIM 24
. 如权利要求23的方法,其特征在于通过下列 (first data) 步骤在逻辑记录层上解决冲突:比较在一致性单元的逻辑记录层的逻辑记录系属元数据与复制版本的逻辑记录层的逻辑记录系属元数据;和按预定的冲突策略选择胜出的逻辑记录系属元数据。




US8190610B2

Filed: 2006-10-05     Issued: 2012-05-29

MapReduce for distributed database processing

(Original Assignee) Yahoo Inc     (Current Assignee) R2 Solutions LLC ; Altaba Inc

Ali Dasdan, Hung-Chih Yang, Ruey-Lung Hsiao
US20050216428A1

Filed: 2004-03-24     Issued: 2005-09-29

Distributed data management system

(Original Assignee) Hitachi Ltd     (Current Assignee) Hitachi Ltd

Yuichi Yagawa
US8190610B2
CLAIM 1
. A method of processing data of a data set over a distributed system , wherein the data set comprises a plurality of data groups , the method comprising : partitioning the data of each one of the data groups into a plurality of data partitions that each have a plurality of key-value pairs and providing each data partition to a selected one of a plurality of mapping functions that are each user-configurable to independently output a plurality of lists of values for each of a set of keys found in such map function' ;

s corresponding data partition to form corresponding intermediate data for that data group and identifiable to that data group , wherein the data of a first data (one second) group has a different schema than the data of a second data group and the data of the first data group is mapped differently than the data of the second data group so that different lists of values are output for the corresponding different intermediate data , wherein the different schema and corresponding different intermediate data have a key in common ;

and reducing the intermediate data for the data groups to at least one output data group , including processing the intermediate data for each data group in a manner that is defined to correspond to that data group , so as to result in a merging of the corresponding different intermediate data based on the key in common , wherein the mapping and reducing operations are performed by a distributed system .
US20050216428A1
CLAIM 1
. A method for distributing data among a plurality of data storage systems comprising : obtaining and storing selection criteria ;
producing profile information for a first data object that is stored in a first data storage system , said profile information comprising content-based information associated with said first data object ;
and selectively copying said first data object to at least one second (first data, first data set) data storage system based on said selection criteria and on said profile information , wherein said first data object is copied to said second data storage system depending on content-based information associated with said first data object .

US8190610B2
CLAIM 17
. A computer system including a plurality of computing devices , the computer system configured to process data of a data set , wherein the data set comprises a plurality of data groups , the computer system comprises at least one processor and memory that are operable to perform the following operations : partitioning the data of each one of the data groups into a plurality of data partitions that each have a plurality of key-value pairs and providing each data partition to a selected one of a plurality of mapping functions that are each user-configurable to independently output a plurality of lists of values for each of a set of keys found in such map function' ;

s corresponding data partition to form corresponding intermediate data for that data group and identifiable to that data group , wherein the data of a first data (one second) group has a different schema than the data of a second data group and the data of the first data group is mapped differently than the data of the second data group so that different lists of values are output for the corresponding different intermediate data , wherein the different schema and corresponding different intermediate data have a key in common ;

and reduce the intermediate data for the data groups to at least one output data group , including processing the intermediate data for each data group in a manner that is defined to correspond to that data group so as to result in a merging of the corresponding different intermediate data based on the key in common .
US20050216428A1
CLAIM 1
. A method for distributing data among a plurality of data storage systems comprising : obtaining and storing selection criteria ;
producing profile information for a first data object that is stored in a first data storage system , said profile information comprising content-based information associated with said first data object ;
and selectively copying said first data object to at least one second (first data, first data set) data storage system based on said selection criteria and on said profile information , wherein said first data object is copied to said second data storage system depending on content-based information associated with said first data object .

US8190610B2
CLAIM 33
. A map-reduce method of processing data from a plurality of groups having different schema (more candidate) over a computer system , the method comprising : for a first data (one second) set having a plurality of first key-value pairs , wherein such first data set belongs to a first data group and the first key-value pairs have a first schema , partitioning the first data set into a plurality of data partitions and providing each data partition to a selected one of a plurality of mapping functions that are each user-configurable to independently output a plurality of lists of values for each of a set of keys found in such map function' ;

s corresponding data partition to form a first intermediate data set having a first set of resulting key-value pairs ;

for a second data set having a plurality of second key-value pairs , wherein such second data set belongs to a second data group and the second key-value pairs have a second schema , partitioning the first data set into a plurality of data partitions and providing each data partition to a selected one of a plurality of mapping functions to form a second intermediate data set having a second set of resulting key-value pairs differing from the first set of resulting key-value pairs , wherein the first schema differs from the second schema , wherein the first and second schema and the first and second set of resulting key-value pairs have a key in common ;

and reducing the first and the second intermediate data set together so as to form an output data set , wherein the reducing is accomplished by iterating on at least one key from the first and second intermediate data set , and the output data set has a different schema than the first and second schema , wherein the mapping and reducing operations are performed by a distributed system .
US20050216428A1
CLAIM 1
. A method for distributing data among a plurality of data storage systems comprising : obtaining and storing selection criteria ;
producing profile information for a first data object that is stored in a first data storage system , said profile information comprising content-based information associated with said first data object ;
and selectively copying said first data object to at least one second (first data, first data set) data storage system based on said selection criteria and on said profile information , wherein said first data object is copied to said second data storage system depending on content-based information associated with said first data object .

US20050216428A1
CLAIM 20
. A data system comprising : a plurality of data centers ;
and a plurality of client systems in data communication with said data centers , each data center comprising : a data storage component ;
a file server component operable to exchange data between a client system and said data storage component ;
a replicator component ;
a receiver component ;
and file selection criteria , wherein said replicator component is operable to produce profile data for a data object that is to be replicated among one or more candidate (groups having different schema) target data centers and to receive a selection indication from each of said candidate target data centers , and to selectively communicate said data object to a candidate target data center based on its selection indication , said profile data representative of content of said data object , wherein said receiver component is operable to receive profile data information from a source data center , said receiver component further operable to communicate a selection indication to said source data center based on said file selection criteria and on said profile data .

US8190610B2
CLAIM 40
. A computer system including a plurality of computing devices , the computer system configured to process data of a data set , wherein the data set comprises a plurality of data groups , the computer system comprises at least one processor and memory that are operable to perform the following operations : for a first data (one second) set having a plurality of first key-value pairs , wherein such first data set belongs to a first data group and the first key-value pairs have a first schema , partitioning the first data set into a plurality of data partitions and providing each data partition to a selected one of a plurality of mapping functions that are each user-configurable to independently output a plurality of lists of values for each of a set of keys found in such map function' ;

s corresponding data partition to form a first intermediate data set having a first set of resulting key-value pairs ;

for a second data set having a plurality of second key-value pairs , wherein such second data set belongs to a second data group and the second key-value pairs have a second schema , partitioning the first data set into a plurality of data partitions and providing each data partition to a selected one of a plurality of mapping functions to form a second intermediate data set having a second set of resulting key-value pairs differing from the first set of resulting key-value pairs , wherein the first schema differs from the second schema ;

and reducing the first and the second intermediate data set together so as to form an output data set , wherein the reducing is accomplished by iterating on at least one key from the first and second intermediate data set , and the output data set has a different schema than the first and second schema , wherein the first and second schema and the first and second set of resulting key-value pairs have a key in common .
US20050216428A1
CLAIM 1
. A method for distributing data among a plurality of data storage systems comprising : obtaining and storing selection criteria ;
producing profile information for a first data object that is stored in a first data storage system , said profile information comprising content-based information associated with said first data object ;
and selectively copying said first data object to at least one second (first data, first data set) data storage system based on said selection criteria and on said profile information , wherein said first data object is copied to said second data storage system depending on content-based information associated with said first data object .




US8190610B2

Filed: 2006-10-05     Issued: 2012-05-29

MapReduce for distributed database processing

(Original Assignee) Yahoo Inc     (Current Assignee) R2 Solutions LLC ; Altaba Inc

Ali Dasdan, Hung-Chih Yang, Ruey-Lung Hsiao
JP2005267301A

Filed: 2004-03-19     Issued: 2005-09-29

ログ同期dbデータ非同期転送によるリカバリ方式および装置

(Original Assignee) Hitachi Ltd; 株式会社日立製作所     

Nobuo Kawamura, Takashi Oeda, Kota Yamaguchi, 高 大枝, 浩太 山口, 信男 河村
US8190610B2
CLAIM 1
. A method of processing data (コンピュ) of a data set over a distributed system , wherein the data set comprises a plurality of data groups , the method comprising : partitioning the data of each one of the data groups into a plurality of data partitions that each have a plurality of key-value pairs and providing each data partition to a selected one of a plurality of mapping functions that are each user-configurable to independently output a plurality of lists of values for each of a set of keys found in such map function' ;

s corresponding data partition to form corresponding intermediate data for that data group and identifiable to that data group , wherein the data of a first data group has a different schema than the data of a second data group and the data of the first data group is mapped differently than the data of the second data group so that different lists of values are output for the corresponding different intermediate data , wherein the different schema and corresponding different intermediate data have a key in common ;

and reducing the intermediate data for the data groups to at least one output data group , including processing the intermediate data for each data group in a manner that is defined to correspond to that data group , so as to result in a merging of the corresponding different intermediate data based on the key in common , wherein the mapping and reducing operations are performed by a distributed system .
JP2005267301A
CLAIM 1
現用系データベース処理システムの障害発生時に待機系データベース処理システムへ切り替えてデータベース処理を続行するディザスタリカバリ方法において、 ホストコンピュ (processing data) ータのデータベースバッファに対して行われたデータベース処理の内容を示すログ情報、前記データベースバッファで更新されたデータベースデータおよび障害回復時に利用するログ情報の位置を示すステータス情報の書き 込み要求をホストコンピュータから受信するステップと、 その受信した書き込み要求の内容に従って、正記憶装置サブシステム内のログ情報、データベース領域のデータ及びステータス情報の更新を行うステップと、 受信したログ情報の書き込み要求を待機系の記憶装置サブシステムである副記憶装置サブシステムへ同期リモートコピー処理で転送するステップと、 受信したデータベースデータまたはステータス情報の書き込み要求を一時的に蓄積して副記憶装置サブシステムへ非同期リモートコピー処理で転送するステップとを有することを特徴とするディザスタリカバリ方法。

US8190610B2
CLAIM 13
. The method of claim 1 , wherein : the intermediate data processing step of the reducing step further comprises processing data (コンピュ) that is not intermediate data .
JP2005267301A
CLAIM 1
現用系データベース処理システムの障害発生時に待機系データベース処理システムへ切り替えてデータベース処理を続行するディザスタリカバリ方法において、 ホストコンピュ (processing data) ータのデータベースバッファに対して行われたデータベース処理の内容を示すログ情報、前記データベースバッファで更新されたデータベースデータおよび障害回復時に利用するログ情報の位置を示すステータス情報の書き 込み要求をホストコンピュータから受信するステップと、 その受信した書き込み要求の内容に従って、正記憶装置サブシステム内のログ情報、データベース領域のデータ及びステータス情報の更新を行うステップと、 受信したログ情報の書き込み要求を待機系の記憶装置サブシステムである副記憶装置サブシステムへ同期リモートコピー処理で転送するステップと、 受信したデータベースデータまたはステータス情報の書き込み要求を一時的に蓄積して副記憶装置サブシステムへ非同期リモートコピー処理で転送するステップとを有することを特徴とするディザスタリカバリ方法。

US8190610B2
CLAIM 17
. A computer system (行うこと) including a plurality of computing devices , the computer system configured to process data of a data set , wherein the data set comprises a plurality of data groups , the computer system comprises at least one processor and memory that are operable to perform the following operations : partitioning the data of each one of the data groups into a plurality of data partitions that each have a plurality of key-value pairs and providing each data partition to a selected one of a plurality of mapping functions that are each user-configurable to independently output a plurality of lists of values for each of a set of keys found in such map function' ;

s corresponding data partition to form corresponding intermediate data for that data group and identifiable to that data group , wherein the data of a first data group has a different schema than the data of a second data group and the data of the first data group is mapped differently than the data of the second data group so that different lists of values are output for the corresponding different intermediate data , wherein the different schema and corresponding different intermediate data have a key in common ;

and reduce the intermediate data for the data groups to at least one output data group , including processing the intermediate data for each data group in a manner that is defined to correspond to that data group so as to result in a merging of the corresponding different intermediate data based on the key in common .
JP2005267301A
CLAIM 6
前記第一の記憶装置システムは、前記第一の計算機からチェックポイント時のデータベースデータ及びステータス情報の書き込み要求を受信した際に前記一時的に蓄積した書き込み要求とともに前記第二の記憶装置サブシステムへ転送しその後前記第一の計算機へ完了通知を行うこと (computer system) を特徴とする請求項5記載のシステム。

US8190610B2
CLAIM 18
. The computer system (行うこと) of claim 17 , wherein : the at least one output data group is a plurality of output data groups .
JP2005267301A
CLAIM 6
前記第一の記憶装置システムは、前記第一の計算機からチェックポイント時のデータベースデータ及びステータス情報の書き込み要求を受信した際に前記一時的に蓄積した書き込み要求とともに前記第二の記憶装置サブシステムへ転送しその後前記第一の計算機へ完了通知を行うこと (computer system) を特徴とする請求項5記載のシステム。

US8190610B2
CLAIM 19
. The computer system (行うこと) of claim 17 , wherein : corresponding intermediate data for a data group being identifiable to the data group includes the corresponding intermediate data being stored such that metadata associated with the manner of storage is identifiable to the data group .
JP2005267301A
CLAIM 6
前記第一の記憶装置システムは、前記第一の計算機からチェックポイント時のデータベースデータ及びステータス情報の書き込み要求を受信した際に前記一時的に蓄積した書き込み要求とともに前記第二の記憶装置サブシステムへ転送しその後前記第一の計算機へ完了通知を行うこと (computer system) を特徴とする請求項5記載のシステム。

US8190610B2
CLAIM 20
. The computer system (行うこと) of claim 19 , wherein : corresponding intermediate data for a data group being identifiable to the data group includes the computer system being configured to store the corresponding intermediate data in association with data that is identifiable to the data group .
JP2005267301A
CLAIM 6
前記第一の記憶装置システムは、前記第一の計算機からチェックポイント時のデータベースデータ及びステータス情報の書き込み要求を受信した際に前記一時的に蓄積した書き込み要求とともに前記第二の記憶装置サブシステムへ転送しその後前記第一の計算機へ完了通知を行うこと (computer system) を特徴とする請求項5記載のシステム。

US8190610B2
CLAIM 21
. The computer system (行うこと) of claim 17 , wherein : the at least one processor and memory being configured to process the intermediate data for each data group in a manner that is defined to correspond to that data group includes , for each data group , the at least one processor and memory being configured to employ an iterator that corresponds to that data group .
JP2005267301A
CLAIM 6
前記第一の記憶装置システムは、前記第一の計算機からチェックポイント時のデータベースデータ及びステータス情報の書き込み要求を受信した際に前記一時的に蓄積した書き込み要求とともに前記第二の記憶装置サブシステムへ転送しその後前記第一の計算機へ完了通知を行うこと (computer system) を特徴とする請求項5記載のシステム。

US8190610B2
CLAIM 22
. The computer system (行うこと) of claim 21 , wherein : the intermediate data includes a plurality of grouped sets of key/value pairs ;

the at least one processor and memory are further operable to reduce the intermediate data via a plurality of reducers ;

for at least one of the reducers , the iterator corresponding to a particular data group , for that reducer , is configured to operate according to a different key of a different schema than the iterator corresponding to another particular data group , for that reducer .
JP2005267301A
CLAIM 6
前記第一の記憶装置システムは、前記第一の計算機からチェックポイント時のデータベースデータ及びステータス情報の書き込み要求を受信した際に前記一時的に蓄積した書き込み要求とともに前記第二の記憶装置サブシステムへ転送しその後前記第一の計算機へ完了通知を行うこと (computer system) を特徴とする請求項5記載のシステム。

US8190610B2
CLAIM 23
. The computer system (行うこと) of claim 17 , wherein : the intermediate data includes a plurality of grouped sets of key/value pairs ;

the at least one processor and memory are further operable to reduce the intermediate data via a plurality of reducers ;

and the at least one processor and memory are further operable to : partition the intermediate data into a plurality of partitions , to provide each key/value pair of the intermediate data to a separate one of the partitions ;

and provide the intermediate data of each partition to a separate one of the reducers .
JP2005267301A
CLAIM 6
前記第一の記憶装置システムは、前記第一の計算機からチェックポイント時のデータベースデータ及びステータス情報の書き込み要求を受信した際に前記一時的に蓄積した書き込み要求とともに前記第二の記憶装置サブシステムへ転送しその後前記第一の計算機へ完了通知を行うこと (computer system) を特徴とする請求項5記載のシステム。

US8190610B2
CLAIM 24
. The computer system (行うこと) of claim 17 , wherein : the intermediate data includes a plurality of grouped sets of key/value pairs ;

the at least one processor and memory are further operable to reduce the intermediate data via a plurality of reducers ;

and the at least one processor and memory are further operable to : partition the intermediate data into a plurality of partitions , to provide at least some of the key/value pairs of the intermediate data to more than one of the partitions ;

and provide the intermediate data of each partition to a separate one of the reducers .
JP2005267301A
CLAIM 6
前記第一の記憶装置システムは、前記第一の計算機からチェックポイント時のデータベースデータ及びステータス情報の書き込み要求を受信した際に前記一時的に蓄積した書き込み要求とともに前記第二の記憶装置サブシステムへ転送しその後前記第一の計算機へ完了通知を行うこと (computer system) を特徴とする請求項5記載のシステム。

US8190610B2
CLAIM 25
. The computer system (行うこと) of claim 24 , wherein : the at least one processor and memory are further operable to , in partitioning , to provide all of the key/value pairs of the intermediate data to all of the partitions .
JP2005267301A
CLAIM 6
前記第一の記憶装置システムは、前記第一の計算機からチェックポイント時のデータベースデータ及びステータス情報の書き込み要求を受信した際に前記一時的に蓄積した書き込み要求とともに前記第二の記憶装置サブシステムへ転送しその後前記第一の計算機へ完了通知を行うこと (computer system) を特徴とする請求項5記載のシステム。

US8190610B2
CLAIM 26
. The computer system (行うこと) of claim 24 , wherein : at least some of the reducers include a sort , group-by-key and combine task ;

the at least one processor and memory are further operable to generate and provide metadata for at least some of the mapping , partitioning , combining , grouping and sorting .
JP2005267301A
CLAIM 6
前記第一の記憶装置システムは、前記第一の計算機からチェックポイント時のデータベースデータ及びステータス情報の書き込み要求を受信した際に前記一時的に蓄積した書き込み要求とともに前記第二の記憶装置サブシステムへ転送しその後前記第一の計算機へ完了通知を行うこと (computer system) を特徴とする請求項5記載のシステム。

US8190610B2
CLAIM 27
. The computer system (行うこと) of claim 26 , wherein : the reducing includes processing the metadata .
JP2005267301A
CLAIM 6
前記第一の記憶装置システムは、前記第一の計算機からチェックポイント時のデータベースデータ及びステータス情報の書き込み要求を受信した際に前記一時的に蓄積した書き込み要求とともに前記第二の記憶装置サブシステムへ転送しその後前記第一の計算機へ完了通知を行うこと (computer system) を特徴とする請求項5記載のシステム。

US8190610B2
CLAIM 28
. The computer system (行うこと) of claim 27 , wherein : the at least one processor and memory being further operable to process the intermediate data for each data group in a manner that is defined to correspond to that data group includes , for each data group , the at least one processor and memory being further operable to employ an iterator that corresponds to that data group , wherein the iterator is configured to provide the associated metadata to the reducing .
JP2005267301A
CLAIM 6
前記第一の記憶装置システムは、前記第一の計算機からチェックポイント時のデータベースデータ及びステータス情報の書き込み要求を受信した際に前記一時的に蓄積した書き込み要求とともに前記第二の記憶装置サブシステムへ転送しその後前記第一の計算機へ完了通知を行うこと (computer system) を特徴とする請求項5記載のシステム。

US8190610B2
CLAIM 29
. The computer system (行うこと) of claim 17 , wherein : the intermediate data processing of the reducing further comprises processing data (コンピュ) that is not intermediate data .
JP2005267301A
CLAIM 1
現用系データベース処理システムの障害発生時に待機系データベース処理システムへ切り替えてデータベース処理を続行するディザスタリカバリ方法において、 ホストコンピュ (processing data) ータのデータベースバッファに対して行われたデータベース処理の内容を示すログ情報、前記データベースバッファで更新されたデータベースデータおよび障害回復時に利用するログ情報の位置を示すステータス情報の書き 込み要求をホストコンピュータから受信するステップと、 その受信した書き込み要求の内容に従って、正記憶装置サブシステム内のログ情報、データベース領域のデータ及びステータス情報の更新を行うステップと、 受信したログ情報の書き込み要求を待機系の記憶装置サブシステムである副記憶装置サブシステムへ同期リモートコピー処理で転送するステップと、 受信したデータベースデータまたはステータス情報の書き込み要求を一時的に蓄積して副記憶装置サブシステムへ非同期リモートコピー処理で転送するステップとを有することを特徴とするディザスタリカバリ方法。

JP2005267301A
CLAIM 6
前記第一の記憶装置システムは、前記第一の計算機からチェックポイント時のデータベースデータ及びステータス情報の書き込み要求を受信した際に前記一時的に蓄積した書き込み要求とともに前記第二の記憶装置サブシステムへ転送しその後前記第一の計算機へ完了通知を行うこと (computer system) を特徴とする請求項5記載のシステム。

US8190610B2
CLAIM 30
. The computer system (行うこと) of claim 29 , wherein : the at least one processor and memory are further operable to reduce the intermediate data via a plurality of reducers ;

and the data that is not intermediate data , for a particular reducer , includes data that is associated with another reducer .
JP2005267301A
CLAIM 6
前記第一の記憶装置システムは、前記第一の計算機からチェックポイント時のデータベースデータ及びステータス情報の書き込み要求を受信した際に前記一時的に蓄積した書き込み要求とともに前記第二の記憶装置サブシステムへ転送しその後前記第一の計算機へ完了通知を行うこと (computer system) を特徴とする請求項5記載のシステム。

US8190610B2
CLAIM 31
. The computer system (行うこと) of claim 29 , wherein : the at least one processor and memory are further operable to reduce the intermediate data via a plurality of reducers ;

and the data that is not intermediate data , for a particular reducer , includes data that is associated with that reducer .
JP2005267301A
CLAIM 6
前記第一の記憶装置システムは、前記第一の計算機からチェックポイント時のデータベースデータ及びステータス情報の書き込み要求を受信した際に前記一時的に蓄積した書き込み要求とともに前記第二の記憶装置サブシステムへ転送しその後前記第一の計算機へ完了通知を行うこと (computer system) を特徴とする請求項5記載のシステム。

US8190610B2
CLAIM 32
. The computer system (行うこと) of claim 17 , wherein the at least one processor and memory are further operable to reduce by relating the data among the plurality of data groups .
JP2005267301A
CLAIM 6
前記第一の記憶装置システムは、前記第一の計算機からチェックポイント時のデータベースデータ及びステータス情報の書き込み要求を受信した際に前記一時的に蓄積した書き込み要求とともに前記第二の記憶装置サブシステムへ転送しその後前記第一の計算機へ完了通知を行うこと (computer system) を特徴とする請求項5記載のシステム。

US8190610B2
CLAIM 33
. A map-reduce method of processing data (コンピュ) from a plurality of groups having different schema over a computer system (行うこと) , the method comprising : for a first data set having a plurality of first key-value pairs , wherein such first data set belongs to a first data group and the first key-value pairs have a first schema , partitioning the first data set into a plurality of data partitions and providing each data partition to a selected one of a plurality of mapping functions that are each user-configurable to independently output a plurality of lists of values for each of a set of keys found in such map function' ;

s corresponding data partition to form a first intermediate data set having a first set of resulting key-value pairs ;

for a second data set having a plurality of second key-value pairs , wherein such second data set belongs to a second data group and the second key-value pairs have a second schema , partitioning the first data set into a plurality of data partitions and providing each data partition to a selected one of a plurality of mapping functions to form a second intermediate data set having a second set of resulting key-value pairs differing from the first set of resulting key-value pairs , wherein the first schema differs from the second schema , wherein the first and second schema and the first and second set of resulting key-value pairs have a key in common ;

and reducing the first and the second intermediate data set together so as to form an output data set , wherein the reducing is accomplished by iterating on at least one key from the first and second intermediate data set , and the output data set has a different schema than the first and second schema , wherein the mapping and reducing operations are performed by a distributed system .
JP2005267301A
CLAIM 1
現用系データベース処理システムの障害発生時に待機系データベース処理システムへ切り替えてデータベース処理を続行するディザスタリカバリ方法において、 ホストコンピュ (processing data) ータのデータベースバッファに対して行われたデータベース処理の内容を示すログ情報、前記データベースバッファで更新されたデータベースデータおよび障害回復時に利用するログ情報の位置を示すステータス情報の書き 込み要求をホストコンピュータから受信するステップと、 その受信した書き込み要求の内容に従って、正記憶装置サブシステム内のログ情報、データベース領域のデータ及びステータス情報の更新を行うステップと、 受信したログ情報の書き込み要求を待機系の記憶装置サブシステムである副記憶装置サブシステムへ同期リモートコピー処理で転送するステップと、 受信したデータベースデータまたはステータス情報の書き込み要求を一時的に蓄積して副記憶装置サブシステムへ非同期リモートコピー処理で転送するステップとを有することを特徴とするディザスタリカバリ方法。

JP2005267301A
CLAIM 6
前記第一の記憶装置システムは、前記第一の計算機からチェックポイント時のデータベースデータ及びステータス情報の書き込み要求を受信した際に前記一時的に蓄積した書き込み要求とともに前記第二の記憶装置サブシステムへ転送しその後前記第一の計算機へ完了通知を行うこと (computer system) を特徴とする請求項5記載のシステム。

US8190610B2
CLAIM 40
. A computer system (行うこと) including a plurality of computing devices , the computer system configured to process data of a data set , wherein the data set comprises a plurality of data groups , the computer system comprises at least one processor and memory that are operable to perform the following operations : for a first data set having a plurality of first key-value pairs , wherein such first data set belongs to a first data group and the first key-value pairs have a first schema , partitioning the first data set into a plurality of data partitions and providing each data partition to a selected one of a plurality of mapping functions that are each user-configurable to independently output a plurality of lists of values for each of a set of keys found in such map function' ;

s corresponding data partition to form a first intermediate data set having a first set of resulting key-value pairs ;

for a second data set having a plurality of second key-value pairs , wherein such second data set belongs to a second data group and the second key-value pairs have a second schema , partitioning the first data set into a plurality of data partitions and providing each data partition to a selected one of a plurality of mapping functions to form a second intermediate data set having a second set of resulting key-value pairs differing from the first set of resulting key-value pairs , wherein the first schema differs from the second schema ;

and reducing the first and the second intermediate data set together so as to form an output data set , wherein the reducing is accomplished by iterating on at least one key from the first and second intermediate data set , and the output data set has a different schema than the first and second schema , wherein the first and second schema and the first and second set of resulting key-value pairs have a key in common .
JP2005267301A
CLAIM 6
前記第一の記憶装置システムは、前記第一の計算機からチェックポイント時のデータベースデータ及びステータス情報の書き込み要求を受信した際に前記一時的に蓄積した書き込み要求とともに前記第二の記憶装置サブシステムへ転送しその後前記第一の計算機へ完了通知を行うこと (computer system) を特徴とする請求項5記載のシステム。

US8190610B2
CLAIM 41
. The computer system (行うこと) of claim 40 , wherein the reducing is accomplished by iterating on at least one key from each of the first and second intermediate data set so that the output data set is a merging of a portion of the first and second intermediate data set .
JP2005267301A
CLAIM 6
前記第一の記憶装置システムは、前記第一の計算機からチェックポイント時のデータベースデータ及びステータス情報の書き込み要求を受信した際に前記一時的に蓄積した書き込み要求とともに前記第二の記憶装置サブシステムへ転送しその後前記第一の計算機へ完了通知を行うこと (computer system) を特徴とする請求項5記載のシステム。

US8190610B2
CLAIM 42
. The computer system (行うこと) of claim 40 , wherein the reducing is accomplished by a plurality of reducers , the d the at least one processor and memory further operable for : partitioning the first or second intermediate data set into a plurality of partitions , at least some of the key-value pairs of the first or second intermediate data being provided to more than one of the partitions ;

and providing the partitioned intermediate data set of each partition to a separate one of the reducers .
JP2005267301A
CLAIM 6
前記第一の記憶装置システムは、前記第一の計算機からチェックポイント時のデータベースデータ及びステータス情報の書き込み要求を受信した際に前記一時的に蓄積した書き込み要求とともに前記第二の記憶装置サブシステムへ転送しその後前記第一の計算機へ完了通知を行うこと (computer system) を特徴とする請求項5記載のシステム。

US8190610B2
CLAIM 43
. The computer system (行うこと) of claim 42 , wherein all of the key-value pairs of the first and second intermediate data set are provided to all of the reducers .
JP2005267301A
CLAIM 6
前記第一の記憶装置システムは、前記第一の計算機からチェックポイント時のデータベースデータ及びステータス情報の書き込み要求を受信した際に前記一時的に蓄積した書き込み要求とともに前記第二の記憶装置サブシステムへ転送しその後前記第一の計算機へ完了通知を行うこと (computer system) を特徴とする請求項5記載のシステム。

US8190610B2
CLAIM 44
. The computer system (行うこと) of claim 42 , wherein at least some of the reducers include a sort , group-by-key and combine task , the d the at least one processor and memory are further operable for : generating and providing metadata for at least some of the mapping , partitioning , combining , grouping and sorting .
JP2005267301A
CLAIM 6
前記第一の記憶装置システムは、前記第一の計算機からチェックポイント時のデータベースデータ及びステータス情報の書き込み要求を受信した際に前記一時的に蓄積した書き込み要求とともに前記第二の記憶装置サブシステムへ転送しその後前記第一の計算機へ完了通知を行うこと (computer system) を特徴とする請求項5記載のシステム。

US8190610B2
CLAIM 45
. The computer system (行うこと) of claim 44 , wherein the reducing includes processing the metadata .
JP2005267301A
CLAIM 6
前記第一の記憶装置システムは、前記第一の計算機からチェックポイント時のデータベースデータ及びステータス情報の書き込み要求を受信した際に前記一時的に蓄積した書き込み要求とともに前記第二の記憶装置サブシステムへ転送しその後前記第一の計算機へ完了通知を行うこと (computer system) を特徴とする請求項5記載のシステム。

US8190610B2
CLAIM 46
. The computer system (行うこと) of claim 45 , wherein iterating includes providing the associated metadata to the processing of the reducing step .
JP2005267301A
CLAIM 6
前記第一の記憶装置システムは、前記第一の計算機からチェックポイント時のデータベースデータ及びステータス情報の書き込み要求を受信した際に前記一時的に蓄積した書き込み要求とともに前記第二の記憶装置サブシステムへ転送しその後前記第一の計算機へ完了通知を行うこと (computer system) を特徴とする請求項5記載のシステム。




US8190610B2

Filed: 2006-10-05     Issued: 2012-05-29

MapReduce for distributed database processing

(Original Assignee) Yahoo Inc     (Current Assignee) R2 Solutions LLC ; Altaba Inc

Ali Dasdan, Hung-Chih Yang, Ruey-Lung Hsiao
JP2004303212A

Filed: 2004-03-02     Issued: 2004-10-28

変形olapを使用する先行キャッシュ・システムおよび方法

(Original Assignee) Microsoft Corp; マイクロソフト コーポレーション     

Thomas P Conlon, Amir Netz, Cristian Petculescu, ネッツ アミル, ペトカレスキュ クリスチャン, ピー.コンロン トーマス
US8190610B2
CLAIM 1
. A method of processing data (コンピュ) of a data set over a distributed system , wherein the data set comprises a plurality of data groups , the method comprising : partitioning the data of each one of the data groups into a plurality of data partitions that each have a plurality of key-value pairs and providing each data partition to a selected one of a plurality of mapping functions that are each user-configurable to independently output a plurality of lists of values for each of a set of keys found in such map function' ;

s corresponding data partition to form corresponding intermediate data for that data group and identifiable to that data group , wherein the data of a first data group has a different schema than the data of a second data group and the data of the first data group is mapped differently than the data of the second data group so that different lists of values are output for the corresponding different intermediate data , wherein the different schema and corresponding different intermediate data have a key in common ;

and reducing the intermediate data for the data groups to at least one output data group , including processing the intermediate data for each data group in a manner that is defined to correspond to that data group , so as to result in a merging of the corresponding different intermediate data based on the key in common , wherein the mapping and reducing operations are performed by a distributed system .
JP2004303212A
CLAIM 55
先行キャッシュ構造に部分的に基づく動的な多次元分析データを備えることを特徴とする、データ分析を容易にし、2つ以上のコンピュ (processing data) ータ構成要素の間で伝送されるデータ・パケット。

US8190610B2
CLAIM 11
. The method of claim 10 , wherein : processing the intermediate data for each data group in a manner that is defined to correspond to that data group includes , for each data group , employing an iterator that corresponds to that data group , wherein the iterator includes providing the associated metadata (ユーザ開始動作) to the processing of the reducing step .
JP2004303212A
CLAIM 27
自動条件検出およびユーザ開始動作 (associated metadata) 検出からなるグループから選択される少なくとも1つが起こると、前記多次元オブジェクトの前記処理をキャンセルすることをさらに備えることを特徴とする請求項24に記載のデータベース・サービング方法。

US8190610B2
CLAIM 13
. The method of claim 1 , wherein : the intermediate data processing step of the reducing step further comprises processing data (コンピュ) that is not intermediate data .
JP2004303212A
CLAIM 55
先行キャッシュ構造に部分的に基づく動的な多次元分析データを備えることを特徴とする、データ分析を容易にし、2つ以上のコンピュ (processing data) ータ構成要素の間で伝送されるデータ・パケット。

US8190610B2
CLAIM 17
. A computer system (可読媒体) including a plurality of computing devices , the computer system configured to process data of a data set , wherein the data set comprises a plurality of data groups , the computer system comprises at least one processor and memory that are operable to perform the following operations : partitioning the data of each one of the data groups into a plurality of data partitions that each have a plurality of key-value pairs and providing each data partition to a selected one of a plurality of mapping functions that are each user-configurable to independently output a plurality of lists of values for each of a set of keys found in such map function' ;

s corresponding data partition to form corresponding intermediate data for that data group and identifiable to that data group , wherein the data of a first data group has a different schema than the data of a second data group and the data of the first data group is mapped differently than the data of the second data group so that different lists of values are output for the corresponding different intermediate data , wherein the different schema and corresponding different intermediate data have a key in common ;

and reduce the intermediate data for the data groups to at least one output data group , including processing the intermediate data for each data group in a manner that is defined to correspond to that data group so as to result in a merging of the corresponding different intermediate data based on the key in common .
JP2004303212A
CLAIM 61
少なくとも1つの変形OLAPキャッシュに少なくとも部分的に基づくデータ・セットに関連づけられた情報を提供する先行キャッシュ・システムを構成することを特徴とする、データ分析を容易にするシステムのコンピュータ実行可能な構成要素を格納するコンピュータ可読媒体 (computer system)

US8190610B2
CLAIM 18
. The computer system (可読媒体) of claim 17 , wherein : the at least one output data group is a plurality of output data groups .
JP2004303212A
CLAIM 61
少なくとも1つの変形OLAPキャッシュに少なくとも部分的に基づくデータ・セットに関連づけられた情報を提供する先行キャッシュ・システムを構成することを特徴とする、データ分析を容易にするシステムのコンピュータ実行可能な構成要素を格納するコンピュータ可読媒体 (computer system)

US8190610B2
CLAIM 19
. The computer system (可読媒体) of claim 17 , wherein : corresponding intermediate data for a data group being identifiable to the data group includes the corresponding intermediate data being stored such that metadata associated with the manner of storage is identifiable to the data group .
JP2004303212A
CLAIM 61
少なくとも1つの変形OLAPキャッシュに少なくとも部分的に基づくデータ・セットに関連づけられた情報を提供する先行キャッシュ・システムを構成することを特徴とする、データ分析を容易にするシステムのコンピュータ実行可能な構成要素を格納するコンピュータ可読媒体 (computer system)

US8190610B2
CLAIM 20
. The computer system (可読媒体) of claim 19 , wherein : corresponding intermediate data for a data group being identifiable to the data group includes the computer system being configured to store the corresponding intermediate data in association with data that is identifiable to the data group .
JP2004303212A
CLAIM 61
少なくとも1つの変形OLAPキャッシュに少なくとも部分的に基づくデータ・セットに関連づけられた情報を提供する先行キャッシュ・システムを構成することを特徴とする、データ分析を容易にするシステムのコンピュータ実行可能な構成要素を格納するコンピュータ可読媒体 (computer system)

US8190610B2
CLAIM 21
. The computer system (可読媒体) of claim 17 , wherein : the at least one processor and memory being configured to process the intermediate data for each data group in a manner that is defined to correspond to that data group includes , for each data group , the at least one processor and memory being configured to employ an iterator that corresponds to that data group .
JP2004303212A
CLAIM 61
少なくとも1つの変形OLAPキャッシュに少なくとも部分的に基づくデータ・セットに関連づけられた情報を提供する先行キャッシュ・システムを構成することを特徴とする、データ分析を容易にするシステムのコンピュータ実行可能な構成要素を格納するコンピュータ可読媒体 (computer system)

US8190610B2
CLAIM 22
. The computer system (可読媒体) of claim 21 , wherein : the intermediate data includes a plurality of grouped sets of key/value pairs ;

the at least one processor and memory are further operable to reduce the intermediate data via a plurality of reducers ;

for at least one of the reducers , the iterator corresponding to a particular data group , for that reducer , is configured to operate according to a different key of a different schema than the iterator corresponding to another particular data group , for that reducer .
JP2004303212A
CLAIM 61
少なくとも1つの変形OLAPキャッシュに少なくとも部分的に基づくデータ・セットに関連づけられた情報を提供する先行キャッシュ・システムを構成することを特徴とする、データ分析を容易にするシステムのコンピュータ実行可能な構成要素を格納するコンピュータ可読媒体 (computer system)

US8190610B2
CLAIM 23
. The computer system (可読媒体) of claim 17 , wherein : the intermediate data includes a plurality of grouped sets of key/value pairs ;

the at least one processor and memory are further operable to reduce the intermediate data via a plurality of reducers ;

and the at least one processor and memory are further operable to : partition the intermediate data into a plurality of partitions , to provide each key/value pair of the intermediate data to a separate one of the partitions ;

and provide the intermediate data of each partition to a separate one of the reducers .
JP2004303212A
CLAIM 61
少なくとも1つの変形OLAPキャッシュに少なくとも部分的に基づくデータ・セットに関連づけられた情報を提供する先行キャッシュ・システムを構成することを特徴とする、データ分析を容易にするシステムのコンピュータ実行可能な構成要素を格納するコンピュータ可読媒体 (computer system)

US8190610B2
CLAIM 24
. The computer system (可読媒体) of claim 17 , wherein : the intermediate data includes a plurality of grouped sets of key/value pairs ;

the at least one processor and memory are further operable to reduce the intermediate data via a plurality of reducers ;

and the at least one processor and memory are further operable to : partition the intermediate data into a plurality of partitions , to provide at least some of the key/value pairs of the intermediate data to more than one of the partitions ;

and provide the intermediate data of each partition to a separate one of the reducers .
JP2004303212A
CLAIM 61
少なくとも1つの変形OLAPキャッシュに少なくとも部分的に基づくデータ・セットに関連づけられた情報を提供する先行キャッシュ・システムを構成することを特徴とする、データ分析を容易にするシステムのコンピュータ実行可能な構成要素を格納するコンピュータ可読媒体 (computer system)

US8190610B2
CLAIM 25
. The computer system (可読媒体) of claim 24 , wherein : the at least one processor and memory are further operable to , in partitioning , to provide all of the key/value pairs of the intermediate data to all of the partitions .
JP2004303212A
CLAIM 61
少なくとも1つの変形OLAPキャッシュに少なくとも部分的に基づくデータ・セットに関連づけられた情報を提供する先行キャッシュ・システムを構成することを特徴とする、データ分析を容易にするシステムのコンピュータ実行可能な構成要素を格納するコンピュータ可読媒体 (computer system)

US8190610B2
CLAIM 26
. The computer system (可読媒体) of claim 24 , wherein : at least some of the reducers include a sort , group-by-key and combine task ;

the at least one processor and memory are further operable to generate and provide metadata for at least some of the mapping , partitioning , combining , grouping and sorting .
JP2004303212A
CLAIM 61
少なくとも1つの変形OLAPキャッシュに少なくとも部分的に基づくデータ・セットに関連づけられた情報を提供する先行キャッシュ・システムを構成することを特徴とする、データ分析を容易にするシステムのコンピュータ実行可能な構成要素を格納するコンピュータ可読媒体 (computer system)

US8190610B2
CLAIM 27
. The computer system (可読媒体) of claim 26 , wherein : the reducing includes processing the metadata .
JP2004303212A
CLAIM 61
少なくとも1つの変形OLAPキャッシュに少なくとも部分的に基づくデータ・セットに関連づけられた情報を提供する先行キャッシュ・システムを構成することを特徴とする、データ分析を容易にするシステムのコンピュータ実行可能な構成要素を格納するコンピュータ可読媒体 (computer system)

US8190610B2
CLAIM 28
. The computer system (可読媒体) of claim 27 , wherein : the at least one processor and memory being further operable to process the intermediate data for each data group in a manner that is defined to correspond to that data group includes , for each data group , the at least one processor and memory being further operable to employ an iterator that corresponds to that data group , wherein the iterator is configured to provide the associated metadata (ユーザ開始動作) to the reducing .
JP2004303212A
CLAIM 27
自動条件検出およびユーザ開始動作 (associated metadata) 検出からなるグループから選択される少なくとも1つが起こると、前記多次元オブジェクトの前記処理をキャンセルすることをさらに備えることを特徴とする請求項24に記載のデータベース・サービング方法。

JP2004303212A
CLAIM 61
少なくとも1つの変形OLAPキャッシュに少なくとも部分的に基づくデータ・セットに関連づけられた情報を提供する先行キャッシュ・システムを構成することを特徴とする、データ分析を容易にするシステムのコンピュータ実行可能な構成要素を格納するコンピュータ可読媒体 (computer system)

US8190610B2
CLAIM 29
. The computer system (可読媒体) of claim 17 , wherein : the intermediate data processing of the reducing further comprises processing data (コンピュ) that is not intermediate data .
JP2004303212A
CLAIM 55
先行キャッシュ構造に部分的に基づく動的な多次元分析データを備えることを特徴とする、データ分析を容易にし、2つ以上のコンピュ (processing data) ータ構成要素の間で伝送されるデータ・パケット。

JP2004303212A
CLAIM 61
少なくとも1つの変形OLAPキャッシュに少なくとも部分的に基づくデータ・セットに関連づけられた情報を提供する先行キャッシュ・システムを構成することを特徴とする、データ分析を容易にするシステムのコンピュータ実行可能な構成要素を格納するコンピュータ可読媒体 (computer system)

US8190610B2
CLAIM 30
. The computer system (可読媒体) of claim 29 , wherein : the at least one processor and memory are further operable to reduce the intermediate data via a plurality of reducers ;

and the data that is not intermediate data , for a particular reducer , includes data that is associated with another reducer .
JP2004303212A
CLAIM 61
少なくとも1つの変形OLAPキャッシュに少なくとも部分的に基づくデータ・セットに関連づけられた情報を提供する先行キャッシュ・システムを構成することを特徴とする、データ分析を容易にするシステムのコンピュータ実行可能な構成要素を格納するコンピュータ可読媒体 (computer system)

US8190610B2
CLAIM 31
. The computer system (可読媒体) of claim 29 , wherein : the at least one processor and memory are further operable to reduce the intermediate data via a plurality of reducers ;

and the data that is not intermediate data , for a particular reducer , includes data that is associated with that reducer .
JP2004303212A
CLAIM 61
少なくとも1つの変形OLAPキャッシュに少なくとも部分的に基づくデータ・セットに関連づけられた情報を提供する先行キャッシュ・システムを構成することを特徴とする、データ分析を容易にするシステムのコンピュータ実行可能な構成要素を格納するコンピュータ可読媒体 (computer system)

US8190610B2
CLAIM 32
. The computer system (可読媒体) of claim 17 , wherein the at least one processor and memory are further operable to reduce by relating the data among the plurality of data groups .
JP2004303212A
CLAIM 61
少なくとも1つの変形OLAPキャッシュに少なくとも部分的に基づくデータ・セットに関連づけられた情報を提供する先行キャッシュ・システムを構成することを特徴とする、データ分析を容易にするシステムのコンピュータ実行可能な構成要素を格納するコンピュータ可読媒体 (computer system)

US8190610B2
CLAIM 33
. A map-reduce method of processing data (コンピュ) from a plurality of groups having different schema over a computer system (可読媒体) , the method comprising : for a first data set having a plurality of first key-value pairs , wherein such first data set belongs to a first data group and the first key-value pairs have a first schema , partitioning the first data set into a plurality of data partitions and providing each data partition to a selected one of a plurality of mapping functions that are each user-configurable to independently output a plurality of lists of values for each of a set of keys found in such map function' ;

s corresponding data partition to form a first intermediate data set having a first set of resulting key-value pairs ;

for a second data set having a plurality of second key-value pairs , wherein such second data set belongs to a second data group and the second key-value pairs have a second schema , partitioning the first data set into a plurality of data partitions and providing each data partition to a selected one of a plurality of mapping functions to form a second intermediate data set having a second set of resulting key-value pairs differing from the first set of resulting key-value pairs , wherein the first schema differs from the second schema , wherein the first and second schema and the first and second set of resulting key-value pairs have a key in common ;

and reducing the first and the second intermediate data set together so as to form an output data set (パラメータ) , wherein the reducing is accomplished by iterating on at least one key from the first and second intermediate data set , and the output data set has a different schema than the first and second schema , wherein the mapping and reducing operations are performed by a distributed system .
JP2004303212A
CLAIM 52
データベースから派生した動的な多次元分析データを提供する少なくとも1つの多次元オブジェクトを提供すること、 少なくとも1つの多次元オブジェクトから派生した動的な多次元分析データを提供する少なくとも1つのキャッシュを構築すること、 キャッシュ再構築パラメータ (output data set) を決定する入力を提供すること、 関連する多次元オブジェクトに対する変更が起きたか決定すること、 前記キャッシュ再構築パラメータが満たされたとき、前記多次元オブジェクトにアクセスするように前記分析構成要素の動作モードを切り換えること、および 関連する多次元オブジェクトに基づいて、前記キャッシュを再構築すること を備えることを特徴とする先行キャッシュ方法。

JP2004303212A
CLAIM 55
先行キャッシュ構造に部分的に基づく動的な多次元分析データを備えることを特徴とする、データ分析を容易にし、2つ以上のコンピュ (processing data) ータ構成要素の間で伝送されるデータ・パケット。

JP2004303212A
CLAIM 61
少なくとも1つの変形OLAPキャッシュに少なくとも部分的に基づくデータ・セットに関連づけられた情報を提供する先行キャッシュ・システムを構成することを特徴とする、データ分析を容易にするシステムのコンピュータ実行可能な構成要素を格納するコンピュータ可読媒体 (computer system)

US8190610B2
CLAIM 34
. The map-reduce method of claim 33 , wherein the reducing is accomplished by iterating on at least one key from each of the first and second intermediate data set so that the output data set (パラメータ) is a merging of a portion of the first and second intermediate data set .
JP2004303212A
CLAIM 52
データベースから派生した動的な多次元分析データを提供する少なくとも1つの多次元オブジェクトを提供すること、 少なくとも1つの多次元オブジェクトから派生した動的な多次元分析データを提供する少なくとも1つのキャッシュを構築すること、 キャッシュ再構築パラメータ (output data set) を決定する入力を提供すること、 関連する多次元オブジェクトに対する変更が起きたか決定すること、 前記キャッシュ再構築パラメータが満たされたとき、前記多次元オブジェクトにアクセスするように前記分析構成要素の動作モードを切り換えること、および 関連する多次元オブジェクトに基づいて、前記キャッシュを再構築すること を備えることを特徴とする先行キャッシュ方法。

US8190610B2
CLAIM 39
. The map-reduce method of claim 38 , wherein iterating includes providing the associated metadata (ユーザ開始動作) to the processing of the reducing step .
JP2004303212A
CLAIM 27
自動条件検出およびユーザ開始動作 (associated metadata) 検出からなるグループから選択される少なくとも1つが起こると、前記多次元オブジェクトの前記処理をキャンセルすることをさらに備えることを特徴とする請求項24に記載のデータベース・サービング方法。

US8190610B2
CLAIM 40
. A computer system (可読媒体) including a plurality of computing devices , the computer system configured to process data of a data set , wherein the data set comprises a plurality of data groups , the computer system comprises at least one processor and memory that are operable to perform the following operations : for a first data set having a plurality of first key-value pairs , wherein such first data set belongs to a first data group and the first key-value pairs have a first schema , partitioning the first data set into a plurality of data partitions and providing each data partition to a selected one of a plurality of mapping functions that are each user-configurable to independently output a plurality of lists of values for each of a set of keys found in such map function' ;

s corresponding data partition to form a first intermediate data set having a first set of resulting key-value pairs ;

for a second data set having a plurality of second key-value pairs , wherein such second data set belongs to a second data group and the second key-value pairs have a second schema , partitioning the first data set into a plurality of data partitions and providing each data partition to a selected one of a plurality of mapping functions to form a second intermediate data set having a second set of resulting key-value pairs differing from the first set of resulting key-value pairs , wherein the first schema differs from the second schema ;

and reducing the first and the second intermediate data set together so as to form an output data set (パラメータ) , wherein the reducing is accomplished by iterating on at least one key from the first and second intermediate data set , and the output data set has a different schema than the first and second schema , wherein the first and second schema and the first and second set of resulting key-value pairs have a key in common .
JP2004303212A
CLAIM 52
データベースから派生した動的な多次元分析データを提供する少なくとも1つの多次元オブジェクトを提供すること、 少なくとも1つの多次元オブジェクトから派生した動的な多次元分析データを提供する少なくとも1つのキャッシュを構築すること、 キャッシュ再構築パラメータ (output data set) を決定する入力を提供すること、 関連する多次元オブジェクトに対する変更が起きたか決定すること、 前記キャッシュ再構築パラメータが満たされたとき、前記多次元オブジェクトにアクセスするように前記分析構成要素の動作モードを切り換えること、および 関連する多次元オブジェクトに基づいて、前記キャッシュを再構築すること を備えることを特徴とする先行キャッシュ方法。

JP2004303212A
CLAIM 61
少なくとも1つの変形OLAPキャッシュに少なくとも部分的に基づくデータ・セットに関連づけられた情報を提供する先行キャッシュ・システムを構成することを特徴とする、データ分析を容易にするシステムのコンピュータ実行可能な構成要素を格納するコンピュータ可読媒体 (computer system)

US8190610B2
CLAIM 41
. The computer system (可読媒体) of claim 40 , wherein the reducing is accomplished by iterating on at least one key from each of the first and second intermediate data set so that the output data set (パラメータ) is a merging of a portion of the first and second intermediate data set .
JP2004303212A
CLAIM 52
データベースから派生した動的な多次元分析データを提供する少なくとも1つの多次元オブジェクトを提供すること、 少なくとも1つの多次元オブジェクトから派生した動的な多次元分析データを提供する少なくとも1つのキャッシュを構築すること、 キャッシュ再構築パラメータ (output data set) を決定する入力を提供すること、 関連する多次元オブジェクトに対する変更が起きたか決定すること、 前記キャッシュ再構築パラメータが満たされたとき、前記多次元オブジェクトにアクセスするように前記分析構成要素の動作モードを切り換えること、および 関連する多次元オブジェクトに基づいて、前記キャッシュを再構築すること を備えることを特徴とする先行キャッシュ方法。

JP2004303212A
CLAIM 61
少なくとも1つの変形OLAPキャッシュに少なくとも部分的に基づくデータ・セットに関連づけられた情報を提供する先行キャッシュ・システムを構成することを特徴とする、データ分析を容易にするシステムのコンピュータ実行可能な構成要素を格納するコンピュータ可読媒体 (computer system)

US8190610B2
CLAIM 42
. The computer system (可読媒体) of claim 40 , wherein the reducing is accomplished by a plurality of reducers , the d the at least one processor and memory further operable for : partitioning the first or second intermediate data set into a plurality of partitions , at least some of the key-value pairs of the first or second intermediate data being provided to more than one of the partitions ;

and providing the partitioned intermediate data set of each partition to a separate one of the reducers .
JP2004303212A
CLAIM 61
少なくとも1つの変形OLAPキャッシュに少なくとも部分的に基づくデータ・セットに関連づけられた情報を提供する先行キャッシュ・システムを構成することを特徴とする、データ分析を容易にするシステムのコンピュータ実行可能な構成要素を格納するコンピュータ可読媒体 (computer system)

US8190610B2
CLAIM 43
. The computer system (可読媒体) of claim 42 , wherein all of the key-value pairs of the first and second intermediate data set are provided to all of the reducers .
JP2004303212A
CLAIM 61
少なくとも1つの変形OLAPキャッシュに少なくとも部分的に基づくデータ・セットに関連づけられた情報を提供する先行キャッシュ・システムを構成することを特徴とする、データ分析を容易にするシステムのコンピュータ実行可能な構成要素を格納するコンピュータ可読媒体 (computer system)

US8190610B2
CLAIM 44
. The computer system (可読媒体) of claim 42 , wherein at least some of the reducers include a sort , group-by-key and combine task , the d the at least one processor and memory are further operable for : generating and providing metadata for at least some of the mapping , partitioning , combining , grouping and sorting .
JP2004303212A
CLAIM 61
少なくとも1つの変形OLAPキャッシュに少なくとも部分的に基づくデータ・セットに関連づけられた情報を提供する先行キャッシュ・システムを構成することを特徴とする、データ分析を容易にするシステムのコンピュータ実行可能な構成要素を格納するコンピュータ可読媒体 (computer system)

US8190610B2
CLAIM 45
. The computer system (可読媒体) of claim 44 , wherein the reducing includes processing the metadata .
JP2004303212A
CLAIM 61
少なくとも1つの変形OLAPキャッシュに少なくとも部分的に基づくデータ・セットに関連づけられた情報を提供する先行キャッシュ・システムを構成することを特徴とする、データ分析を容易にするシステムのコンピュータ実行可能な構成要素を格納するコンピュータ可読媒体 (computer system)

US8190610B2
CLAIM 46
. The computer system (可読媒体) of claim 45 , wherein iterating includes providing the associated metadata (ユーザ開始動作) to the processing of the reducing step .
JP2004303212A
CLAIM 27
自動条件検出およびユーザ開始動作 (associated metadata) 検出からなるグループから選択される少なくとも1つが起こると、前記多次元オブジェクトの前記処理をキャンセルすることをさらに備えることを特徴とする請求項24に記載のデータベース・サービング方法。

JP2004303212A
CLAIM 61
少なくとも1つの変形OLAPキャッシュに少なくとも部分的に基づくデータ・セットに関連づけられた情報を提供する先行キャッシュ・システムを構成することを特徴とする、データ分析を容易にするシステムのコンピュータ実行可能な構成要素を格納するコンピュータ可読媒体 (computer system)




US8190610B2

Filed: 2006-10-05     Issued: 2012-05-29

MapReduce for distributed database processing

(Original Assignee) Yahoo Inc     (Current Assignee) R2 Solutions LLC ; Altaba Inc

Ali Dasdan, Hung-Chih Yang, Ruey-Lung Hsiao
EP1564654A1

Filed: 2004-02-10     Issued: 2005-08-17

Apparatus and method for determining synchronization status of database copies connected by a radio air interface of a radio communication system

(Original Assignee) Research in Motion Ltd     (Current Assignee) BlackBerry Ltd

David Paul Yach, Barry Warren Linkert, Jie Zhu, Salim Hayder Omar, Piotr K. Tysowski, Hecht-Enns, Catherine Phillips, Kathy Ann Pereira
US8190610B2
CLAIM 1
. A method of processing data of a data set over a distributed system , wherein the data set comprises a plurality of data groups , the method comprising : partitioning the data of each one of the data groups into a plurality of data partitions that each have a plurality of key-value pairs and providing each data partition (mobile node) to a selected one of a plurality of mapping functions that are each user-configurable to independently output a plurality of lists of values for each of a set of keys found in such map function' ;

s corresponding data partition (selected portions) to form corresponding intermediate data for that data group and identifiable to that data group , wherein the data of a first data group has a different schema than the data of a second data group and the data of the first data group is mapped differently than the data of the second data group so that different lists of values are output for the corresponding different intermediate data , wherein the different schema and corresponding different intermediate data have a key in common ;

and reducing the intermediate data for the data groups to at least one output data group , including processing the intermediate data for each data group in a manner that is defined to correspond to that data group , so as to result in a merging of the corresponding different intermediate data based on the key in common , wherein the mapping and reducing operations are performed by a distributed system .
EP1564654A1
CLAIM 1
In a radio communication system comprising a network parthaving at least a first network copy database maintained thereat and a mobile node (data partition) having a corresponding at least a first mobile copy database maintained thereat , data of the first network copy database and the first mobile copy database in match with one another when data of each data record of the first network copy database is in complete correspondence with corresponding data of each data record of the first mobile copy database , an improvement of apparatus for facilitating determination of whether the first network copy database is in match with the first mobile copy database , said apparatus comprising : a group hash generator embodied at the mobile node and adapted to receive indications of at least selected portions (s corresponding data partition) of at least selected data records of the at least the first mobile copy , said group hash generator selectably for forming a group hash value formed of aggregated hash values aggregated from individual record hashes representative of at least a first selected group of the selected data records , the group hash values for communication to the network part to determine whether the first network copy database and the first mobile copy database are in match with one another .

US8190610B2
CLAIM 17
. A computer system including a plurality of computing devices , the computer system configured to process data of a data set , wherein the data set comprises a plurality of data groups , the computer system comprises at least one processor and memory that are operable to perform the following operations : partitioning the data of each one of the data groups into a plurality of data partitions that each have a plurality of key-value pairs and providing each data partition (mobile node) to a selected one of a plurality of mapping functions that are each user-configurable to independently output a plurality of lists of values for each of a set of keys found in such map function' ;

s corresponding data partition (selected portions) to form corresponding intermediate data for that data group and identifiable to that data group , wherein the data of a first data group has a different schema than the data of a second data group and the data of the first data group is mapped differently than the data of the second data group so that different lists of values are output for the corresponding different intermediate data , wherein the different schema and corresponding different intermediate data have a key in common ;

and reduce the intermediate data for the data groups to at least one output data group , including processing the intermediate data for each data group in a manner that is defined to correspond to that data group so as to result in a merging of the corresponding different intermediate data based on the key in common .
EP1564654A1
CLAIM 1
In a radio communication system comprising a network parthaving at least a first network copy database maintained thereat and a mobile node (data partition) having a corresponding at least a first mobile copy database maintained thereat , data of the first network copy database and the first mobile copy database in match with one another when data of each data record of the first network copy database is in complete correspondence with corresponding data of each data record of the first mobile copy database , an improvement of apparatus for facilitating determination of whether the first network copy database is in match with the first mobile copy database , said apparatus comprising : a group hash generator embodied at the mobile node and adapted to receive indications of at least selected portions (s corresponding data partition) of at least selected data records of the at least the first mobile copy , said group hash generator selectably for forming a group hash value formed of aggregated hash values aggregated from individual record hashes representative of at least a first selected group of the selected data records , the group hash values for communication to the network part to determine whether the first network copy database and the first mobile copy database are in match with one another .

US8190610B2
CLAIM 33
. A map-reduce method of processing data from a plurality of groups having different schema over a computer system , the method comprising : for a first data set having a plurality of first key-value pairs , wherein such first data set belongs to a first data group and the first key-value pairs have a first schema , partitioning the first data set into a plurality of data partitions and providing each data partition (mobile node) to a selected one of a plurality of mapping functions that are each user-configurable to independently output a plurality of lists of values for each of a set of keys found in such map function' ;

s corresponding data partition (selected portions) to form a first intermediate data set having a first set of resulting key-value pairs ;

for a second data set having a plurality of second key-value pairs , wherein such second data set belongs to a second data group and the second key-value pairs have a second schema , partitioning the first data set into a plurality of data partitions and providing each data partition to a selected one of a plurality of mapping functions to form a second intermediate data set having a second set of resulting key-value pairs differing from the first set of resulting key-value pairs , wherein the first schema differs from the second schema , wherein the first and second schema and the first and second set of resulting key-value pairs have a key in common ;

and reducing the first and the second intermediate data set together so as to form an output data set (n value) , wherein the reducing is accomplished by iterating on at least one key from the first and second intermediate data set , and the output data set has a different schema than the first and second schema , wherein the mapping and reducing operations are performed by a distributed system .
EP1564654A1
CLAIM 1
In a radio communication system comprising a network parthaving at least a first network copy database maintained thereat and a mobile node (data partition) having a corresponding at least a first mobile copy database maintained thereat , data of the first network copy database and the first mobile copy database in match with one another when data of each data record of the first network copy database is in complete correspondence with corresponding data of each data record of the first mobile copy database , an improvement of apparatus for facilitating determination of whether the first network copy database is in match with the first mobile copy database , said apparatus comprising : a group hash generator embodied at the mobile node and adapted to receive indications of at least selected portions (s corresponding data partition) of at least selected data records of the at least the first mobile copy , said group hash generator selectably for forming a group hash value formed of aggregated hash values aggregated from individual record hashes representative of at least a first selected group of the selected data records , the group hash values for communication to the network part to determine whether the first network copy database and the first mobile copy database are in match with one another .

EP1564654A1
CLAIM 15
In a method of communicating in a radio communication system comprising a network part having at least a first network copy database maintained thereat and a mobile node having a corresponding at least a first mobile copy database maintained thereat , data of the first network copy database and the first mobile copy database in match with one another when data of each data record of the first network copy database is in complete correspondence with corresponding data of each data record of the first mobile copy database , an improvement of a method for facilitating determination of whether the first network copy database is in match with the first mobile copy database , said method comprising : aggregating together individual record hashes of individual data records of at least a first selected group of data records of the at least the first mobile copy to form a group hash value ;
sending the group hash value formed during said operation of aggregating to the network part ;
comparing the group hash value sent to the network part during said operation of sending with a corresponding network generated value ;
and determining whether the group hash value corresponds in value (output data set) with the corresponding network generated value .

US8190610B2
CLAIM 34
. The map-reduce method of claim 33 , wherein the reducing is accomplished by iterating on at least one key from each of the first and second intermediate data set so that the output data set (n value) is a merging of a portion of the first and second intermediate data set .
EP1564654A1
CLAIM 15
In a method of communicating in a radio communication system comprising a network part having at least a first network copy database maintained thereat and a mobile node having a corresponding at least a first mobile copy database maintained thereat , data of the first network copy database and the first mobile copy database in match with one another when data of each data record of the first network copy database is in complete correspondence with corresponding data of each data record of the first mobile copy database , an improvement of a method for facilitating determination of whether the first network copy database is in match with the first mobile copy database , said method comprising : aggregating together individual record hashes of individual data records of at least a first selected group of data records of the at least the first mobile copy to form a group hash value ;
sending the group hash value formed during said operation of aggregating to the network part ;
comparing the group hash value sent to the network part during said operation of sending with a corresponding network generated value ;
and determining whether the group hash value corresponds in value (output data set) with the corresponding network generated value .

US8190610B2
CLAIM 40
. A computer system including a plurality of computing devices , the computer system configured to process data of a data set , wherein the data set comprises a plurality of data groups , the computer system comprises at least one processor and memory that are operable to perform the following operations : for a first data set having a plurality of first key-value pairs , wherein such first data set belongs to a first data group and the first key-value pairs have a first schema , partitioning the first data set into a plurality of data partitions and providing each data partition (mobile node) to a selected one of a plurality of mapping functions that are each user-configurable to independently output a plurality of lists of values for each of a set of keys found in such map function' ;

s corresponding data partition (selected portions) to form a first intermediate data set having a first set of resulting key-value pairs ;

for a second data set having a plurality of second key-value pairs , wherein such second data set belongs to a second data group and the second key-value pairs have a second schema , partitioning the first data set into a plurality of data partitions and providing each data partition to a selected one of a plurality of mapping functions to form a second intermediate data set having a second set of resulting key-value pairs differing from the first set of resulting key-value pairs , wherein the first schema differs from the second schema ;

and reducing the first and the second intermediate data set together so as to form an output data set (n value) , wherein the reducing is accomplished by iterating on at least one key from the first and second intermediate data set , and the output data set has a different schema than the first and second schema , wherein the first and second schema and the first and second set of resulting key-value pairs have a key in common .
EP1564654A1
CLAIM 1
In a radio communication system comprising a network parthaving at least a first network copy database maintained thereat and a mobile node (data partition) having a corresponding at least a first mobile copy database maintained thereat , data of the first network copy database and the first mobile copy database in match with one another when data of each data record of the first network copy database is in complete correspondence with corresponding data of each data record of the first mobile copy database , an improvement of apparatus for facilitating determination of whether the first network copy database is in match with the first mobile copy database , said apparatus comprising : a group hash generator embodied at the mobile node and adapted to receive indications of at least selected portions (s corresponding data partition) of at least selected data records of the at least the first mobile copy , said group hash generator selectably for forming a group hash value formed of aggregated hash values aggregated from individual record hashes representative of at least a first selected group of the selected data records , the group hash values for communication to the network part to determine whether the first network copy database and the first mobile copy database are in match with one another .

EP1564654A1
CLAIM 15
In a method of communicating in a radio communication system comprising a network part having at least a first network copy database maintained thereat and a mobile node having a corresponding at least a first mobile copy database maintained thereat , data of the first network copy database and the first mobile copy database in match with one another when data of each data record of the first network copy database is in complete correspondence with corresponding data of each data record of the first mobile copy database , an improvement of a method for facilitating determination of whether the first network copy database is in match with the first mobile copy database , said method comprising : aggregating together individual record hashes of individual data records of at least a first selected group of data records of the at least the first mobile copy to form a group hash value ;
sending the group hash value formed during said operation of aggregating to the network part ;
comparing the group hash value sent to the network part during said operation of sending with a corresponding network generated value ;
and determining whether the group hash value corresponds in value (output data set) with the corresponding network generated value .

US8190610B2
CLAIM 41
. The computer system of claim 40 , wherein the reducing is accomplished by iterating on at least one key from each of the first and second intermediate data set so that the output data set (n value) is a merging of a portion of the first and second intermediate data set .
EP1564654A1
CLAIM 15
In a method of communicating in a radio communication system comprising a network part having at least a first network copy database maintained thereat and a mobile node having a corresponding at least a first mobile copy database maintained thereat , data of the first network copy database and the first mobile copy database in match with one another when data of each data record of the first network copy database is in complete correspondence with corresponding data of each data record of the first mobile copy database , an improvement of a method for facilitating determination of whether the first network copy database is in match with the first mobile copy database , said method comprising : aggregating together individual record hashes of individual data records of at least a first selected group of data records of the at least the first mobile copy to form a group hash value ;
sending the group hash value formed during said operation of aggregating to the network part ;
comparing the group hash value sent to the network part during said operation of sending with a corresponding network generated value ;
and determining whether the group hash value corresponds in value (output data set) with the corresponding network generated value .




US8190610B2

Filed: 2006-10-05     Issued: 2012-05-29

MapReduce for distributed database processing

(Original Assignee) Yahoo Inc     (Current Assignee) R2 Solutions LLC ; Altaba Inc

Ali Dasdan, Hung-Chih Yang, Ruey-Lung Hsiao
EP1564658A1

Filed: 2004-02-10     Issued: 2005-08-17

Apparatus and associated method for synchronizing databases by comparing hash values.

(Original Assignee) Research in Motion Ltd     (Current Assignee) BlackBerry Ltd

David Paul Yach, Barry Warren Linkert, Jie Zhu, Salim Hayder Omar, Piotr K Tysowski, Albert Hecht-Enns, Catherine Phillips, Kathy Ann Pereira
US8190610B2
CLAIM 1
. A method of processing data of a data set over a distributed system , wherein the data set comprises a plurality of data groups , the method comprising : partitioning the data of each one of the data groups into a plurality of data partitions that each have a plurality of key-value pairs and providing each data partition (information representative) to a selected one of a plurality of mapping functions (corresponding port) that are each user-configurable to independently output a plurality of lists of values for each of a set of keys found in such map function' ;

s corresponding data partition to form corresponding intermediate data for that data group (information representative) and identifiable to that data group , wherein the data of a first data group has a different schema than the data of a second data group and the data of the first data group is mapped differently than the data of the second data group so that different lists of values are output for the corresponding different intermediate data , wherein the different schema and corresponding different intermediate data have a key in common ;

and reducing the intermediate data for the data groups to at least one output data group , including processing the intermediate data for each data group in a manner that is defined to correspond to that data group , so as to result in a merging of the corresponding different intermediate data based on the key in common , wherein the mapping and reducing operations are performed by a distributed system .
EP1564658A1
CLAIM 15
In a method of communicating in a radio communication system having a network part that maintains at least a network-copy first database containing data and a mobile node that maintains at least a mobile-copy first database containing data , the data of the network-copy and the mobile-copy of the first database , respectively , correspond when the network-copy and the mobile-copy of the first database are in match with one another , an improvement of a method for selectably altering the data of at least one of the network-copy and the mobile-copy of the at least the first database to place the network-copy and the mobile-copy in match with each other , said method comprising : selectably sending first hash information from the mobile node to the network part , the first hash information representative (data partition, data group, second data group, s corresponding data partition to form corresponding intermediate data, s corresponding data partition) of the mobile-copy of the first database ;
comparing , at the network part , the first hash information sent during said operation of selectably sending with corresponding network-copy first hash information ;
comparing , at the network part , the first hash information sent during said operation of selectably sending with corresponding network-copy first hash information ;
and selectably requesting additional information regarding the mobile-copy first database responsive to comparisons made during said operation of comparing the first hash information .

EP1564658A1
CLAIM 18
The method of claim 17 further comprising the operations of delivering the at least the portions of the mobile-copy to the network part , comparing the portions of the mobile copy delivered during said operation of delivering with corresponding port (mapping functions) ions of the network-copy of the at least the first database , and selectably causing overwriting of the portions of a selected one of the network-copy and the mobile-copy responsive to comparisons made during said operation of comparing the portions of the mobile-copy .

US8190610B2
CLAIM 2
. The method of claim 1 , wherein : the at least one output data group (information representative) is a plurality of output data groups .
EP1564658A1
CLAIM 15
In a method of communicating in a radio communication system having a network part that maintains at least a network-copy first database containing data and a mobile node that maintains at least a mobile-copy first database containing data , the data of the network-copy and the mobile-copy of the first database , respectively , correspond when the network-copy and the mobile-copy of the first database are in match with one another , an improvement of a method for selectably altering the data of at least one of the network-copy and the mobile-copy of the at least the first database to place the network-copy and the mobile-copy in match with each other , said method comprising : selectably sending first hash information from the mobile node to the network part , the first hash information representative (data partition, data group, second data group, s corresponding data partition to form corresponding intermediate data, s corresponding data partition) of the mobile-copy of the first database ;
comparing , at the network part , the first hash information sent during said operation of selectably sending with corresponding network-copy first hash information ;
comparing , at the network part , the first hash information sent during said operation of selectably sending with corresponding network-copy first hash information ;
and selectably requesting additional information regarding the mobile-copy first database responsive to comparisons made during said operation of comparing the first hash information .

US8190610B2
CLAIM 3
. The method of claim 1 , wherein : corresponding intermediate data for a data group (information representative) being identifiable to the data group includes the corresponding intermediate data being stored such that metadata associated with a manner of storage is identifiable to the data group .
EP1564658A1
CLAIM 15
In a method of communicating in a radio communication system having a network part that maintains at least a network-copy first database containing data and a mobile node that maintains at least a mobile-copy first database containing data , the data of the network-copy and the mobile-copy of the first database , respectively , correspond when the network-copy and the mobile-copy of the first database are in match with one another , an improvement of a method for selectably altering the data of at least one of the network-copy and the mobile-copy of the at least the first database to place the network-copy and the mobile-copy in match with each other , said method comprising : selectably sending first hash information from the mobile node to the network part , the first hash information representative (data partition, data group, second data group, s corresponding data partition to form corresponding intermediate data, s corresponding data partition) of the mobile-copy of the first database ;
comparing , at the network part , the first hash information sent during said operation of selectably sending with corresponding network-copy first hash information ;
comparing , at the network part , the first hash information sent during said operation of selectably sending with corresponding network-copy first hash information ;
and selectably requesting additional information regarding the mobile-copy first database responsive to comparisons made during said operation of comparing the first hash information .

US8190610B2
CLAIM 4
. The method of claim 1 , wherein : corresponding intermediate data for a data group (information representative) being identifiable to the data group includes the corresponding intermediate data being stored in association with data that is identifiable to the data group .
EP1564658A1
CLAIM 15
In a method of communicating in a radio communication system having a network part that maintains at least a network-copy first database containing data and a mobile node that maintains at least a mobile-copy first database containing data , the data of the network-copy and the mobile-copy of the first database , respectively , correspond when the network-copy and the mobile-copy of the first database are in match with one another , an improvement of a method for selectably altering the data of at least one of the network-copy and the mobile-copy of the at least the first database to place the network-copy and the mobile-copy in match with each other , said method comprising : selectably sending first hash information from the mobile node to the network part , the first hash information representative (data partition, data group, second data group, s corresponding data partition to form corresponding intermediate data, s corresponding data partition) of the mobile-copy of the first database ;
comparing , at the network part , the first hash information sent during said operation of selectably sending with corresponding network-copy first hash information ;
comparing , at the network part , the first hash information sent during said operation of selectably sending with corresponding network-copy first hash information ;
and selectably requesting additional information regarding the mobile-copy first database responsive to comparisons made during said operation of comparing the first hash information .

US8190610B2
CLAIM 5
. The method of claim 1 , wherein : processing the intermediate data for each data group (information representative) in a manner that is defined to correspond to that data group includes , for each data group , employing an iterator that corresponds to that data group .
EP1564658A1
CLAIM 15
In a method of communicating in a radio communication system having a network part that maintains at least a network-copy first database containing data and a mobile node that maintains at least a mobile-copy first database containing data , the data of the network-copy and the mobile-copy of the first database , respectively , correspond when the network-copy and the mobile-copy of the first database are in match with one another , an improvement of a method for selectably altering the data of at least one of the network-copy and the mobile-copy of the at least the first database to place the network-copy and the mobile-copy in match with each other , said method comprising : selectably sending first hash information from the mobile node to the network part , the first hash information representative (data partition, data group, second data group, s corresponding data partition to form corresponding intermediate data, s corresponding data partition) of the mobile-copy of the first database ;
comparing , at the network part , the first hash information sent during said operation of selectably sending with corresponding network-copy first hash information ;
comparing , at the network part , the first hash information sent during said operation of selectably sending with corresponding network-copy first hash information ;
and selectably requesting additional information regarding the mobile-copy first database responsive to comparisons made during said operation of comparing the first hash information .

US8190610B2
CLAIM 11
. The method of claim 10 , wherein : processing the intermediate data for each data group (information representative) in a manner that is defined to correspond to that data group includes , for each data group , employing an iterator that corresponds to that data group , wherein the iterator includes providing the associated metadata to the processing of the reducing step .
EP1564658A1
CLAIM 15
In a method of communicating in a radio communication system having a network part that maintains at least a network-copy first database containing data and a mobile node that maintains at least a mobile-copy first database containing data , the data of the network-copy and the mobile-copy of the first database , respectively , correspond when the network-copy and the mobile-copy of the first database are in match with one another , an improvement of a method for selectably altering the data of at least one of the network-copy and the mobile-copy of the at least the first database to place the network-copy and the mobile-copy in match with each other , said method comprising : selectably sending first hash information from the mobile node to the network part , the first hash information representative (data partition, data group, second data group, s corresponding data partition to form corresponding intermediate data, s corresponding data partition) of the mobile-copy of the first database ;
comparing , at the network part , the first hash information sent during said operation of selectably sending with corresponding network-copy first hash information ;
comparing , at the network part , the first hash information sent during said operation of selectably sending with corresponding network-copy first hash information ;
and selectably requesting additional information regarding the mobile-copy first database responsive to comparisons made during said operation of comparing the first hash information .

US8190610B2
CLAIM 12
. The method of claim 5 , wherein : the intermediate data includes a plurality of grouped sets of key/value pairs ;

the reducing step is carried out by a plurality of reducers ;

for at least one of the reducers , the iterator corresponding to a particular data group (information representative) , for that reducer , operates according to a different key of a different schema than the iterator corresponding to another particular data group , for that reducer .
EP1564658A1
CLAIM 15
In a method of communicating in a radio communication system having a network part that maintains at least a network-copy first database containing data and a mobile node that maintains at least a mobile-copy first database containing data , the data of the network-copy and the mobile-copy of the first database , respectively , correspond when the network-copy and the mobile-copy of the first database are in match with one another , an improvement of a method for selectably altering the data of at least one of the network-copy and the mobile-copy of the at least the first database to place the network-copy and the mobile-copy in match with each other , said method comprising : selectably sending first hash information from the mobile node to the network part , the first hash information representative (data partition, data group, second data group, s corresponding data partition to form corresponding intermediate data, s corresponding data partition) of the mobile-copy of the first database ;
comparing , at the network part , the first hash information sent during said operation of selectably sending with corresponding network-copy first hash information ;
comparing , at the network part , the first hash information sent during said operation of selectably sending with corresponding network-copy first hash information ;
and selectably requesting additional information regarding the mobile-copy first database responsive to comparisons made during said operation of comparing the first hash information .

US8190610B2
CLAIM 17
. A computer system including a plurality of computing devices , the computer system configured to process data of a data set , wherein the data set comprises a plurality of data groups , the computer system comprises at least one processor and memory that are operable to perform the following operations : partitioning the data of each one of the data groups into a plurality of data partitions that each have a plurality of key-value pairs and providing each data partition (information representative) to a selected one of a plurality of mapping functions (corresponding port) that are each user-configurable to independently output a plurality of lists of values for each of a set of keys found in such map function' ;

s corresponding data partition to form corresponding intermediate data for that data group (information representative) and identifiable to that data group , wherein the data of a first data group has a different schema than the data of a second data group and the data of the first data group is mapped differently than the data of the second data group so that different lists of values are output for the corresponding different intermediate data , wherein the different schema and corresponding different intermediate data have a key in common ;

and reduce the intermediate data for the data groups to at least one output data group , including processing the intermediate data for each data group in a manner that is defined to correspond to that data group so as to result in a merging of the corresponding different intermediate data based on the key in common .
EP1564658A1
CLAIM 15
In a method of communicating in a radio communication system having a network part that maintains at least a network-copy first database containing data and a mobile node that maintains at least a mobile-copy first database containing data , the data of the network-copy and the mobile-copy of the first database , respectively , correspond when the network-copy and the mobile-copy of the first database are in match with one another , an improvement of a method for selectably altering the data of at least one of the network-copy and the mobile-copy of the at least the first database to place the network-copy and the mobile-copy in match with each other , said method comprising : selectably sending first hash information from the mobile node to the network part , the first hash information representative (data partition, data group, second data group, s corresponding data partition to form corresponding intermediate data, s corresponding data partition) of the mobile-copy of the first database ;
comparing , at the network part , the first hash information sent during said operation of selectably sending with corresponding network-copy first hash information ;
comparing , at the network part , the first hash information sent during said operation of selectably sending with corresponding network-copy first hash information ;
and selectably requesting additional information regarding the mobile-copy first database responsive to comparisons made during said operation of comparing the first hash information .

EP1564658A1
CLAIM 18
The method of claim 17 further comprising the operations of delivering the at least the portions of the mobile-copy to the network part , comparing the portions of the mobile copy delivered during said operation of delivering with corresponding port (mapping functions) ions of the network-copy of the at least the first database , and selectably causing overwriting of the portions of a selected one of the network-copy and the mobile-copy responsive to comparisons made during said operation of comparing the portions of the mobile-copy .

US8190610B2
CLAIM 18
. The computer system of claim 17 , wherein : the at least one output data group (information representative) is a plurality of output data groups .
EP1564658A1
CLAIM 15
In a method of communicating in a radio communication system having a network part that maintains at least a network-copy first database containing data and a mobile node that maintains at least a mobile-copy first database containing data , the data of the network-copy and the mobile-copy of the first database , respectively , correspond when the network-copy and the mobile-copy of the first database are in match with one another , an improvement of a method for selectably altering the data of at least one of the network-copy and the mobile-copy of the at least the first database to place the network-copy and the mobile-copy in match with each other , said method comprising : selectably sending first hash information from the mobile node to the network part , the first hash information representative (data partition, data group, second data group, s corresponding data partition to form corresponding intermediate data, s corresponding data partition) of the mobile-copy of the first database ;
comparing , at the network part , the first hash information sent during said operation of selectably sending with corresponding network-copy first hash information ;
comparing , at the network part , the first hash information sent during said operation of selectably sending with corresponding network-copy first hash information ;
and selectably requesting additional information regarding the mobile-copy first database responsive to comparisons made during said operation of comparing the first hash information .

US8190610B2
CLAIM 19
. The computer system of claim 17 , wherein : corresponding intermediate data for a data group (information representative) being identifiable to the data group includes the corresponding intermediate data being stored such that metadata associated with the manner of storage is identifiable to the data group .
EP1564658A1
CLAIM 15
In a method of communicating in a radio communication system having a network part that maintains at least a network-copy first database containing data and a mobile node that maintains at least a mobile-copy first database containing data , the data of the network-copy and the mobile-copy of the first database , respectively , correspond when the network-copy and the mobile-copy of the first database are in match with one another , an improvement of a method for selectably altering the data of at least one of the network-copy and the mobile-copy of the at least the first database to place the network-copy and the mobile-copy in match with each other , said method comprising : selectably sending first hash information from the mobile node to the network part , the first hash information representative (data partition, data group, second data group, s corresponding data partition to form corresponding intermediate data, s corresponding data partition) of the mobile-copy of the first database ;
comparing , at the network part , the first hash information sent during said operation of selectably sending with corresponding network-copy first hash information ;
comparing , at the network part , the first hash information sent during said operation of selectably sending with corresponding network-copy first hash information ;
and selectably requesting additional information regarding the mobile-copy first database responsive to comparisons made during said operation of comparing the first hash information .

US8190610B2
CLAIM 20
. The computer system of claim 19 , wherein : corresponding intermediate data for a data group (information representative) being identifiable to the data group includes the computer system being configured to store the corresponding intermediate data in association with data that is identifiable to the data group .
EP1564658A1
CLAIM 15
In a method of communicating in a radio communication system having a network part that maintains at least a network-copy first database containing data and a mobile node that maintains at least a mobile-copy first database containing data , the data of the network-copy and the mobile-copy of the first database , respectively , correspond when the network-copy and the mobile-copy of the first database are in match with one another , an improvement of a method for selectably altering the data of at least one of the network-copy and the mobile-copy of the at least the first database to place the network-copy and the mobile-copy in match with each other , said method comprising : selectably sending first hash information from the mobile node to the network part , the first hash information representative (data partition, data group, second data group, s corresponding data partition to form corresponding intermediate data, s corresponding data partition) of the mobile-copy of the first database ;
comparing , at the network part , the first hash information sent during said operation of selectably sending with corresponding network-copy first hash information ;
comparing , at the network part , the first hash information sent during said operation of selectably sending with corresponding network-copy first hash information ;
and selectably requesting additional information regarding the mobile-copy first database responsive to comparisons made during said operation of comparing the first hash information .

US8190610B2
CLAIM 21
. The computer system of claim 17 , wherein : the at least one processor and memory being configured to process the intermediate data for each data group (information representative) in a manner that is defined to correspond to that data group includes , for each data group , the at least one processor and memory being configured to employ an iterator that corresponds to that data group .
EP1564658A1
CLAIM 15
In a method of communicating in a radio communication system having a network part that maintains at least a network-copy first database containing data and a mobile node that maintains at least a mobile-copy first database containing data , the data of the network-copy and the mobile-copy of the first database , respectively , correspond when the network-copy and the mobile-copy of the first database are in match with one another , an improvement of a method for selectably altering the data of at least one of the network-copy and the mobile-copy of the at least the first database to place the network-copy and the mobile-copy in match with each other , said method comprising : selectably sending first hash information from the mobile node to the network part , the first hash information representative (data partition, data group, second data group, s corresponding data partition to form corresponding intermediate data, s corresponding data partition) of the mobile-copy of the first database ;
comparing , at the network part , the first hash information sent during said operation of selectably sending with corresponding network-copy first hash information ;
comparing , at the network part , the first hash information sent during said operation of selectably sending with corresponding network-copy first hash information ;
and selectably requesting additional information regarding the mobile-copy first database responsive to comparisons made during said operation of comparing the first hash information .

US8190610B2
CLAIM 22
. The computer system of claim 21 , wherein : the intermediate data includes a plurality of grouped sets of key/value pairs ;

the at least one processor and memory are further operable to reduce the intermediate data via a plurality of reducers ;

for at least one of the reducers , the iterator corresponding to a particular data group (information representative) , for that reducer , is configured to operate according to a different key of a different schema than the iterator corresponding to another particular data group , for that reducer .
EP1564658A1
CLAIM 15
In a method of communicating in a radio communication system having a network part that maintains at least a network-copy first database containing data and a mobile node that maintains at least a mobile-copy first database containing data , the data of the network-copy and the mobile-copy of the first database , respectively , correspond when the network-copy and the mobile-copy of the first database are in match with one another , an improvement of a method for selectably altering the data of at least one of the network-copy and the mobile-copy of the at least the first database to place the network-copy and the mobile-copy in match with each other , said method comprising : selectably sending first hash information from the mobile node to the network part , the first hash information representative (data partition, data group, second data group, s corresponding data partition to form corresponding intermediate data, s corresponding data partition) of the mobile-copy of the first database ;
comparing , at the network part , the first hash information sent during said operation of selectably sending with corresponding network-copy first hash information ;
comparing , at the network part , the first hash information sent during said operation of selectably sending with corresponding network-copy first hash information ;
and selectably requesting additional information regarding the mobile-copy first database responsive to comparisons made during said operation of comparing the first hash information .

US8190610B2
CLAIM 28
. The computer system of claim 27 , wherein : the at least one processor and memory being further operable to process the intermediate data for each data group (information representative) in a manner that is defined to correspond to that data group includes , for each data group , the at least one processor and memory being further operable to employ an iterator that corresponds to that data group , wherein the iterator is configured to provide the associated metadata to the reducing .
EP1564658A1
CLAIM 15
In a method of communicating in a radio communication system having a network part that maintains at least a network-copy first database containing data and a mobile node that maintains at least a mobile-copy first database containing data , the data of the network-copy and the mobile-copy of the first database , respectively , correspond when the network-copy and the mobile-copy of the first database are in match with one another , an improvement of a method for selectably altering the data of at least one of the network-copy and the mobile-copy of the at least the first database to place the network-copy and the mobile-copy in match with each other , said method comprising : selectably sending first hash information from the mobile node to the network part , the first hash information representative (data partition, data group, second data group, s corresponding data partition to form corresponding intermediate data, s corresponding data partition) of the mobile-copy of the first database ;
comparing , at the network part , the first hash information sent during said operation of selectably sending with corresponding network-copy first hash information ;
comparing , at the network part , the first hash information sent during said operation of selectably sending with corresponding network-copy first hash information ;
and selectably requesting additional information regarding the mobile-copy first database responsive to comparisons made during said operation of comparing the first hash information .

US8190610B2
CLAIM 33
. A map-reduce method of processing data from a plurality of groups having different schema over a computer system , the method comprising : for a first data set having a plurality of first key-value pairs , wherein such first data set belongs to a first data group (information representative) and the first key-value pairs have a first schema , partitioning the first data set into a plurality of data partitions and providing each data partition (information representative) to a selected one of a plurality of mapping functions (corresponding port) that are each user-configurable to independently output a plurality of lists of values for each of a set of keys found in such map function' ;

s corresponding data partition to form a first intermediate data set having a first set of resulting key-value pairs ;

for a second data set having a plurality of second key-value pairs , wherein such second data set belongs to a second data group and the second key-value pairs have a second schema , partitioning the first data set into a plurality of data partitions and providing each data partition to a selected one of a plurality of mapping functions to form a second intermediate data set having a second set of resulting key-value pairs differing from the first set of resulting key-value pairs , wherein the first schema differs from the second schema , wherein the first and second schema and the first and second set of resulting key-value pairs have a key in common ;

and reducing the first and the second intermediate data set together so as to form an output data set , wherein the reducing is accomplished by iterating on at least one key from the first and second intermediate data set , and the output data set has a different schema than the first and second schema , wherein the mapping and reducing operations are performed by a distributed system .
EP1564658A1
CLAIM 15
In a method of communicating in a radio communication system having a network part that maintains at least a network-copy first database containing data and a mobile node that maintains at least a mobile-copy first database containing data , the data of the network-copy and the mobile-copy of the first database , respectively , correspond when the network-copy and the mobile-copy of the first database are in match with one another , an improvement of a method for selectably altering the data of at least one of the network-copy and the mobile-copy of the at least the first database to place the network-copy and the mobile-copy in match with each other , said method comprising : selectably sending first hash information from the mobile node to the network part , the first hash information representative (data partition, data group, second data group, s corresponding data partition to form corresponding intermediate data, s corresponding data partition) of the mobile-copy of the first database ;
comparing , at the network part , the first hash information sent during said operation of selectably sending with corresponding network-copy first hash information ;
comparing , at the network part , the first hash information sent during said operation of selectably sending with corresponding network-copy first hash information ;
and selectably requesting additional information regarding the mobile-copy first database responsive to comparisons made during said operation of comparing the first hash information .

EP1564658A1
CLAIM 18
The method of claim 17 further comprising the operations of delivering the at least the portions of the mobile-copy to the network part , comparing the portions of the mobile copy delivered during said operation of delivering with corresponding port (mapping functions) ions of the network-copy of the at least the first database , and selectably causing overwriting of the portions of a selected one of the network-copy and the mobile-copy responsive to comparisons made during said operation of comparing the portions of the mobile-copy .

US8190610B2
CLAIM 40
. A computer system including a plurality of computing devices , the computer system configured to process data of a data set , wherein the data set comprises a plurality of data groups , the computer system comprises at least one processor and memory that are operable to perform the following operations : for a first data set having a plurality of first key-value pairs , wherein such first data set belongs to a first data group (information representative) and the first key-value pairs have a first schema , partitioning the first data set into a plurality of data partitions and providing each data partition (information representative) to a selected one of a plurality of mapping functions (corresponding port) that are each user-configurable to independently output a plurality of lists of values for each of a set of keys found in such map function' ;

s corresponding data partition to form a first intermediate data set having a first set of resulting key-value pairs ;

for a second data set having a plurality of second key-value pairs , wherein such second data set belongs to a second data group and the second key-value pairs have a second schema , partitioning the first data set into a plurality of data partitions and providing each data partition to a selected one of a plurality of mapping functions to form a second intermediate data set having a second set of resulting key-value pairs differing from the first set of resulting key-value pairs , wherein the first schema differs from the second schema ;

and reducing the first and the second intermediate data set together so as to form an output data set , wherein the reducing is accomplished by iterating on at least one key from the first and second intermediate data set , and the output data set has a different schema than the first and second schema , wherein the first and second schema and the first and second set of resulting key-value pairs have a key in common .
EP1564658A1
CLAIM 15
In a method of communicating in a radio communication system having a network part that maintains at least a network-copy first database containing data and a mobile node that maintains at least a mobile-copy first database containing data , the data of the network-copy and the mobile-copy of the first database , respectively , correspond when the network-copy and the mobile-copy of the first database are in match with one another , an improvement of a method for selectably altering the data of at least one of the network-copy and the mobile-copy of the at least the first database to place the network-copy and the mobile-copy in match with each other , said method comprising : selectably sending first hash information from the mobile node to the network part , the first hash information representative (data partition, data group, second data group, s corresponding data partition to form corresponding intermediate data, s corresponding data partition) of the mobile-copy of the first database ;
comparing , at the network part , the first hash information sent during said operation of selectably sending with corresponding network-copy first hash information ;
comparing , at the network part , the first hash information sent during said operation of selectably sending with corresponding network-copy first hash information ;
and selectably requesting additional information regarding the mobile-copy first database responsive to comparisons made during said operation of comparing the first hash information .

EP1564658A1
CLAIM 18
The method of claim 17 further comprising the operations of delivering the at least the portions of the mobile-copy to the network part , comparing the portions of the mobile copy delivered during said operation of delivering with corresponding port (mapping functions) ions of the network-copy of the at least the first database , and selectably causing overwriting of the portions of a selected one of the network-copy and the mobile-copy responsive to comparisons made during said operation of comparing the portions of the mobile-copy .




US8190610B2

Filed: 2006-10-05     Issued: 2012-05-29

MapReduce for distributed database processing

(Original Assignee) Yahoo Inc     (Current Assignee) R2 Solutions LLC ; Altaba Inc

Ali Dasdan, Hung-Chih Yang, Ruey-Lung Hsiao
JP2005196602A

Filed: 2004-01-09     Issued: 2005-07-21

無共有型データベース管理システムにおけるシステム構成変更方法

(Original Assignee) Hitachi Ltd; 株式会社日立製作所     

Shinji Fujiwara, Daisuke Ito, Frederico Mashel, Kazutomo Ushijima, マシエル・フレデリコ, 大輔 伊藤, 一智 牛嶋, 真二 藤原
US8190610B2
CLAIM 1
. A method of processing data of a data set over a distributed system , wherein the data set comprises a plurality of data groups , the method comprising : partitioning the data of each one of the data groups into a plurality of data partitions that each have a plurality of key-value pairs and providing each data partition to a selected one of a plurality of mapping functions that are each user-configurable to independently output a plurality of lists of values for each of a set of keys found in such map function' ;

s corresponding data partition to form corresponding intermediate data for that data group and identifiable to that data group , wherein the data of a first data group has a different schema than the data of a second data group and the data of the first data group is mapped differently than the data of the second data group so that different lists of values are output for the corresponding different intermediate data (のプログラム) , wherein the different schema and corresponding different intermediate data have a key in common ;

and reducing the intermediate data for the data groups to at least one output data group , including processing the intermediate data for each data group in a manner that is defined to correspond to that data group , so as to result in a merging of the corresponding different intermediate data based on the key in common , wherein the mapping and reducing operations are performed by a distributed system .
JP2005196602A
CLAIM 2
前記データのソート及びマージを専用に行うサーバの追加は、前記ネットワークに接続されていて前記無共有型データベース管理システムのCPUリソースとして機能していないプールサーバ上でデータのソート及びマージを行うためのプログラム (corresponding different intermediate data) を立ち上げることにより行うことを特徴とする請求項1のシステム構成変更方法。

US8190610B2
CLAIM 17
. A computer system (行うこと) including a plurality of computing devices , the computer system configured to process data of a data set , wherein the data set comprises a plurality of data groups , the computer system comprises at least one processor and memory that are operable to perform the following operations : partitioning the data of each one of the data groups into a plurality of data partitions that each have a plurality of key-value pairs and providing each data partition to a selected one of a plurality of mapping functions that are each user-configurable to independently output a plurality of lists of values for each of a set of keys found in such map function' ;

s corresponding data partition to form corresponding intermediate data for that data group and identifiable to that data group , wherein the data of a first data group has a different schema than the data of a second data group and the data of the first data group is mapped differently than the data of the second data group so that different lists of values are output for the corresponding different intermediate data (のプログラム) , wherein the different schema and corresponding different intermediate data have a key in common ;

and reduce the intermediate data for the data groups to at least one output data group , including processing the intermediate data for each data group in a manner that is defined to correspond to that data group so as to result in a merging of the corresponding different intermediate data based on the key in common .
JP2005196602A
CLAIM 2
前記データのソート及びマージを専用に行うサーバの追加は、前記ネットワークに接続されていて前記無共有型データベース管理システムのCPUリソースとして機能していないプールサーバ上でデータのソート及びマージを行うためのプログラム (corresponding different intermediate data) を立ち上げることにより行うこと (computer system) を特徴とする請求項1のシステム構成変更方法。

US8190610B2
CLAIM 18
. The computer system (行うこと) of claim 17 , wherein : the at least one output data group is a plurality of output data groups .
JP2005196602A
CLAIM 2
前記データのソート及びマージを専用に行うサーバの追加は、前記ネットワークに接続されていて前記無共有型データベース管理システムのCPUリソースとして機能していないプールサーバ上でデータのソート及びマージを行うためのプログラムを立ち上げることにより行うこと (computer system) を特徴とする請求項1のシステム構成変更方法。

US8190610B2
CLAIM 19
. The computer system (行うこと) of claim 17 , wherein : corresponding intermediate data for a data group being identifiable to the data group includes the corresponding intermediate data being stored such that metadata associated with the manner of storage is identifiable to the data group .
JP2005196602A
CLAIM 2
前記データのソート及びマージを専用に行うサーバの追加は、前記ネットワークに接続されていて前記無共有型データベース管理システムのCPUリソースとして機能していないプールサーバ上でデータのソート及びマージを行うためのプログラムを立ち上げることにより行うこと (computer system) を特徴とする請求項1のシステム構成変更方法。

US8190610B2
CLAIM 20
. The computer system (行うこと) of claim 19 , wherein : corresponding intermediate data for a data group being identifiable to the data group includes the computer system being configured to store the corresponding intermediate data in association with data that is identifiable to the data group .
JP2005196602A
CLAIM 2
前記データのソート及びマージを専用に行うサーバの追加は、前記ネットワークに接続されていて前記無共有型データベース管理システムのCPUリソースとして機能していないプールサーバ上でデータのソート及びマージを行うためのプログラムを立ち上げることにより行うこと (computer system) を特徴とする請求項1のシステム構成変更方法。

US8190610B2
CLAIM 21
. The computer system (行うこと) of claim 17 , wherein : the at least one processor and memory being configured to process the intermediate data for each data group in a manner that is defined to correspond to that data group includes , for each data group , the at least one processor and memory being configured to employ an iterator that corresponds to that data group .
JP2005196602A
CLAIM 2
前記データのソート及びマージを専用に行うサーバの追加は、前記ネットワークに接続されていて前記無共有型データベース管理システムのCPUリソースとして機能していないプールサーバ上でデータのソート及びマージを行うためのプログラムを立ち上げることにより行うこと (computer system) を特徴とする請求項1のシステム構成変更方法。

US8190610B2
CLAIM 22
. The computer system (行うこと) of claim 21 , wherein : the intermediate data includes a plurality of grouped sets of key/value pairs ;

the at least one processor and memory are further operable to reduce the intermediate data via a plurality of reducers ;

for at least one of the reducers , the iterator corresponding to a particular data group , for that reducer , is configured to operate according to a different key of a different schema than the iterator corresponding to another particular data group , for that reducer .
JP2005196602A
CLAIM 2
前記データのソート及びマージを専用に行うサーバの追加は、前記ネットワークに接続されていて前記無共有型データベース管理システムのCPUリソースとして機能していないプールサーバ上でデータのソート及びマージを行うためのプログラムを立ち上げることにより行うこと (computer system) を特徴とする請求項1のシステム構成変更方法。

US8190610B2
CLAIM 23
. The computer system (行うこと) of claim 17 , wherein : the intermediate data includes a plurality of grouped sets of key/value pairs ;

the at least one processor and memory are further operable to reduce the intermediate data via a plurality of reducers ;

and the at least one processor and memory are further operable to : partition the intermediate data into a plurality of partitions , to provide each key/value pair of the intermediate data to a separate one of the partitions ;

and provide the intermediate data of each partition to a separate one of the reducers .
JP2005196602A
CLAIM 2
前記データのソート及びマージを専用に行うサーバの追加は、前記ネットワークに接続されていて前記無共有型データベース管理システムのCPUリソースとして機能していないプールサーバ上でデータのソート及びマージを行うためのプログラムを立ち上げることにより行うこと (computer system) を特徴とする請求項1のシステム構成変更方法。

US8190610B2
CLAIM 24
. The computer system (行うこと) of claim 17 , wherein : the intermediate data includes a plurality of grouped sets of key/value pairs ;

the at least one processor and memory are further operable to reduce the intermediate data via a plurality of reducers ;

and the at least one processor and memory are further operable to : partition the intermediate data into a plurality of partitions , to provide at least some of the key/value pairs of the intermediate data to more than one of the partitions ;

and provide the intermediate data of each partition to a separate one of the reducers .
JP2005196602A
CLAIM 2
前記データのソート及びマージを専用に行うサーバの追加は、前記ネットワークに接続されていて前記無共有型データベース管理システムのCPUリソースとして機能していないプールサーバ上でデータのソート及びマージを行うためのプログラムを立ち上げることにより行うこと (computer system) を特徴とする請求項1のシステム構成変更方法。

US8190610B2
CLAIM 25
. The computer system (行うこと) of claim 24 , wherein : the at least one processor and memory are further operable to , in partitioning , to provide all of the key/value pairs of the intermediate data to all of the partitions .
JP2005196602A
CLAIM 2
前記データのソート及びマージを専用に行うサーバの追加は、前記ネットワークに接続されていて前記無共有型データベース管理システムのCPUリソースとして機能していないプールサーバ上でデータのソート及びマージを行うためのプログラムを立ち上げることにより行うこと (computer system) を特徴とする請求項1のシステム構成変更方法。

US8190610B2
CLAIM 26
. The computer system (行うこと) of claim 24 , wherein : at least some of the reducers include a sort , group-by-key and combine task ;

the at least one processor and memory are further operable to generate and provide metadata for at least some of the mapping , partitioning , combining , grouping and sorting .
JP2005196602A
CLAIM 2
前記データのソート及びマージを専用に行うサーバの追加は、前記ネットワークに接続されていて前記無共有型データベース管理システムのCPUリソースとして機能していないプールサーバ上でデータのソート及びマージを行うためのプログラムを立ち上げることにより行うこと (computer system) を特徴とする請求項1のシステム構成変更方法。

US8190610B2
CLAIM 27
. The computer system (行うこと) of claim 26 , wherein : the reducing includes processing the metadata .
JP2005196602A
CLAIM 2
前記データのソート及びマージを専用に行うサーバの追加は、前記ネットワークに接続されていて前記無共有型データベース管理システムのCPUリソースとして機能していないプールサーバ上でデータのソート及びマージを行うためのプログラムを立ち上げることにより行うこと (computer system) を特徴とする請求項1のシステム構成変更方法。

US8190610B2
CLAIM 28
. The computer system (行うこと) of claim 27 , wherein : the at least one processor and memory being further operable to process the intermediate data for each data group in a manner that is defined to correspond to that data group includes , for each data group , the at least one processor and memory being further operable to employ an iterator that corresponds to that data group , wherein the iterator is configured to provide the associated metadata to the reducing .
JP2005196602A
CLAIM 2
前記データのソート及びマージを専用に行うサーバの追加は、前記ネットワークに接続されていて前記無共有型データベース管理システムのCPUリソースとして機能していないプールサーバ上でデータのソート及びマージを行うためのプログラムを立ち上げることにより行うこと (computer system) を特徴とする請求項1のシステム構成変更方法。

US8190610B2
CLAIM 29
. The computer system (行うこと) of claim 17 , wherein : the intermediate data processing of the reducing further comprises processing data that is not intermediate data .
JP2005196602A
CLAIM 2
前記データのソート及びマージを専用に行うサーバの追加は、前記ネットワークに接続されていて前記無共有型データベース管理システムのCPUリソースとして機能していないプールサーバ上でデータのソート及びマージを行うためのプログラムを立ち上げることにより行うこと (computer system) を特徴とする請求項1のシステム構成変更方法。

US8190610B2
CLAIM 30
. The computer system (行うこと) of claim 29 , wherein : the at least one processor and memory are further operable to reduce the intermediate data via a plurality of reducers ;

and the data that is not intermediate data , for a particular reducer , includes data that is associated with another reducer .
JP2005196602A
CLAIM 2
前記データのソート及びマージを専用に行うサーバの追加は、前記ネットワークに接続されていて前記無共有型データベース管理システムのCPUリソースとして機能していないプールサーバ上でデータのソート及びマージを行うためのプログラムを立ち上げることにより行うこと (computer system) を特徴とする請求項1のシステム構成変更方法。

US8190610B2
CLAIM 31
. The computer system (行うこと) of claim 29 , wherein : the at least one processor and memory are further operable to reduce the intermediate data via a plurality of reducers ;

and the data that is not intermediate data , for a particular reducer , includes data that is associated with that reducer .
JP2005196602A
CLAIM 2
前記データのソート及びマージを専用に行うサーバの追加は、前記ネットワークに接続されていて前記無共有型データベース管理システムのCPUリソースとして機能していないプールサーバ上でデータのソート及びマージを行うためのプログラムを立ち上げることにより行うこと (computer system) を特徴とする請求項1のシステム構成変更方法。

US8190610B2
CLAIM 32
. The computer system (行うこと) of claim 17 , wherein the at least one processor and memory are further operable to reduce by relating the data among the plurality of data groups .
JP2005196602A
CLAIM 2
前記データのソート及びマージを専用に行うサーバの追加は、前記ネットワークに接続されていて前記無共有型データベース管理システムのCPUリソースとして機能していないプールサーバ上でデータのソート及びマージを行うためのプログラムを立ち上げることにより行うこと (computer system) を特徴とする請求項1のシステム構成変更方法。

US8190610B2
CLAIM 33
. A map-reduce method of processing data from a plurality of groups having different schema over a computer system (行うこと) , the method comprising : for a first data set having a plurality of first key-value pairs , wherein such first data set belongs to a first data group and the first key-value pairs have a first schema , partitioning the first data set into a plurality of data partitions and providing each data partition to a selected one of a plurality of mapping functions that are each user-configurable to independently output a plurality of lists of values for each of a set of keys found in such map function' ;

s corresponding data partition to form a first intermediate data set having a first set of resulting key-value pairs ;

for a second data set having a plurality of second key-value pairs , wherein such second data set belongs to a second data group and the second key-value pairs have a second schema , partitioning the first data set into a plurality of data partitions and providing each data partition to a selected one of a plurality of mapping functions to form a second intermediate data (ファイル) set having a second set of resulting key-value pairs differing from the first set of resulting key-value pairs , wherein the first schema differs from the second schema , wherein the first and second schema and the first and second set of resulting key-value pairs have a key in common ;

and reducing the first and the second intermediate data set together so as to form an output data set , wherein the reducing is accomplished by iterating on at least one key from the first and second intermediate data set , and the output data set has a different schema than the first and second schema , wherein the mapping and reducing operations are performed by a distributed system .
JP2005196602A
CLAIM 2
前記データのソート及びマージを専用に行うサーバの追加は、前記ネットワークに接続されていて前記無共有型データベース管理システムのCPUリソースとして機能していないプールサーバ上でデータのソート及びマージを行うためのプログラムを立ち上げることにより行うこと (computer system) を特徴とする請求項1のシステム構成変更方法。

JP2005196602A
CLAIM 7
複数のデータベースサーバからなる無共有型データベース管理システムにおいて、データベースサーバ間のデータ移動をファイル (second intermediate data) システムの仮想ボリュームの変更に置換するための装置で構成され、前記装置は以下の要素を含む。 1.ネットワーク上で共有されるストレージ 2.複数の単位ボリュームを、仮想ボリュームとして使用可能にするファイルシステム 3.データ移動をファイルシステムに委譲する機構を持ったデータベース探索プログラム また、項目1のストレージは以下の機構を有する。 a.単位ボリュームの容量をファイルシステムに通知する為のインターフェース また、項目2のファイルシステムは以下の機構を有する。 i.複数の単位ボリュームを、仮想ボリュームとして使用可能にするためのストレージ仮想化層 ii.単位ボリュームの容量をデータベースサーバに通知する為のインターフェース iii.仮想ボリュームの構成をデータベースサーバからの要請に応じて変更するためのインターフェース さらに、項目3のデータベース探索プログラムは以下の機構を有する。 A.データ移動の際に、移動の最小単位となるデータグループB.個々の単位ボリュームをファイルシステムに問い合わせる為のインターフェース C.単位ボリュームに重複なくデータグループを配置する機構D.仮想ボリュームの構成変更をファイルシステムに要請するためのインターフェース。

US8190610B2
CLAIM 34
. The map-reduce method of claim 33 , wherein the reducing is accomplished by iterating on at least one key from each of the first and second intermediate data (ファイル) set so that the output data set is a merging of a portion of the first and second intermediate data set .
JP2005196602A
CLAIM 7
複数のデータベースサーバからなる無共有型データベース管理システムにおいて、データベースサーバ間のデータ移動をファイル (second intermediate data) システムの仮想ボリュームの変更に置換するための装置で構成され、前記装置は以下の要素を含む。 1.ネットワーク上で共有されるストレージ 2.複数の単位ボリュームを、仮想ボリュームとして使用可能にするファイルシステム 3.データ移動をファイルシステムに委譲する機構を持ったデータベース探索プログラム また、項目1のストレージは以下の機構を有する。 a.単位ボリュームの容量をファイルシステムに通知する為のインターフェース また、項目2のファイルシステムは以下の機構を有する。 i.複数の単位ボリュームを、仮想ボリュームとして使用可能にするためのストレージ仮想化層 ii.単位ボリュームの容量をデータベースサーバに通知する為のインターフェース iii.仮想ボリュームの構成をデータベースサーバからの要請に応じて変更するためのインターフェース さらに、項目3のデータベース探索プログラムは以下の機構を有する。 A.データ移動の際に、移動の最小単位となるデータグループB.個々の単位ボリュームをファイルシステムに問い合わせる為のインターフェース C.単位ボリュームに重複なくデータグループを配置する機構D.仮想ボリュームの構成変更をファイルシステムに要請するためのインターフェース。

US8190610B2
CLAIM 35
. The map-reduce method of claim 33 , wherein the reducing is accomplished by a plurality of reducers , the method further comprises : partitioning the first or second intermediate data (ファイル) set into a plurality of partitions , at least some of the key-value pairs of the first or second intermediate data being provided to more than one of the partitions ;

and providing the partitioned intermediate data set of each partition to a separate one of the reducers .
JP2005196602A
CLAIM 7
複数のデータベースサーバからなる無共有型データベース管理システムにおいて、データベースサーバ間のデータ移動をファイル (second intermediate data) システムの仮想ボリュームの変更に置換するための装置で構成され、前記装置は以下の要素を含む。 1.ネットワーク上で共有されるストレージ 2.複数の単位ボリュームを、仮想ボリュームとして使用可能にするファイルシステム 3.データ移動をファイルシステムに委譲する機構を持ったデータベース探索プログラム また、項目1のストレージは以下の機構を有する。 a.単位ボリュームの容量をファイルシステムに通知する為のインターフェース また、項目2のファイルシステムは以下の機構を有する。 i.複数の単位ボリュームを、仮想ボリュームとして使用可能にするためのストレージ仮想化層 ii.単位ボリュームの容量をデータベースサーバに通知する為のインターフェース iii.仮想ボリュームの構成をデータベースサーバからの要請に応じて変更するためのインターフェース さらに、項目3のデータベース探索プログラムは以下の機構を有する。 A.データ移動の際に、移動の最小単位となるデータグループB.個々の単位ボリュームをファイルシステムに問い合わせる為のインターフェース C.単位ボリュームに重複なくデータグループを配置する機構D.仮想ボリュームの構成変更をファイルシステムに要請するためのインターフェース。

US8190610B2
CLAIM 36
. The map-reduce method of claim 35 , wherein all of the key-value pairs of the first and second intermediate data (ファイル) set are provided to all of the reducers .
JP2005196602A
CLAIM 7
複数のデータベースサーバからなる無共有型データベース管理システムにおいて、データベースサーバ間のデータ移動をファイル (second intermediate data) システムの仮想ボリュームの変更に置換するための装置で構成され、前記装置は以下の要素を含む。 1.ネットワーク上で共有されるストレージ 2.複数の単位ボリュームを、仮想ボリュームとして使用可能にするファイルシステム 3.データ移動をファイルシステムに委譲する機構を持ったデータベース探索プログラム また、項目1のストレージは以下の機構を有する。 a.単位ボリュームの容量をファイルシステムに通知する為のインターフェース また、項目2のファイルシステムは以下の機構を有する。 i.複数の単位ボリュームを、仮想ボリュームとして使用可能にするためのストレージ仮想化層 ii.単位ボリュームの容量をデータベースサーバに通知する為のインターフェース iii.仮想ボリュームの構成をデータベースサーバからの要請に応じて変更するためのインターフェース さらに、項目3のデータベース探索プログラムは以下の機構を有する。 A.データ移動の際に、移動の最小単位となるデータグループB.個々の単位ボリュームをファイルシステムに問い合わせる為のインターフェース C.単位ボリュームに重複なくデータグループを配置する機構D.仮想ボリュームの構成変更をファイルシステムに要請するためのインターフェース。

US8190610B2
CLAIM 40
. A computer system (行うこと) including a plurality of computing devices , the computer system configured to process data of a data set , wherein the data set comprises a plurality of data groups , the computer system comprises at least one processor and memory that are operable to perform the following operations : for a first data set having a plurality of first key-value pairs , wherein such first data set belongs to a first data group and the first key-value pairs have a first schema , partitioning the first data set into a plurality of data partitions and providing each data partition to a selected one of a plurality of mapping functions that are each user-configurable to independently output a plurality of lists of values for each of a set of keys found in such map function' ;

s corresponding data partition to form a first intermediate data set having a first set of resulting key-value pairs ;

for a second data set having a plurality of second key-value pairs , wherein such second data set belongs to a second data group and the second key-value pairs have a second schema , partitioning the first data set into a plurality of data partitions and providing each data partition to a selected one of a plurality of mapping functions to form a second intermediate data (ファイル) set having a second set of resulting key-value pairs differing from the first set of resulting key-value pairs , wherein the first schema differs from the second schema ;

and reducing the first and the second intermediate data set together so as to form an output data set , wherein the reducing is accomplished by iterating on at least one key from the first and second intermediate data set , and the output data set has a different schema than the first and second schema , wherein the first and second schema and the first and second set of resulting key-value pairs have a key in common .
JP2005196602A
CLAIM 2
前記データのソート及びマージを専用に行うサーバの追加は、前記ネットワークに接続されていて前記無共有型データベース管理システムのCPUリソースとして機能していないプールサーバ上でデータのソート及びマージを行うためのプログラムを立ち上げることにより行うこと (computer system) を特徴とする請求項1のシステム構成変更方法。

JP2005196602A
CLAIM 7
複数のデータベースサーバからなる無共有型データベース管理システムにおいて、データベースサーバ間のデータ移動をファイル (second intermediate data) システムの仮想ボリュームの変更に置換するための装置で構成され、前記装置は以下の要素を含む。 1.ネットワーク上で共有されるストレージ 2.複数の単位ボリュームを、仮想ボリュームとして使用可能にするファイルシステム 3.データ移動をファイルシステムに委譲する機構を持ったデータベース探索プログラム また、項目1のストレージは以下の機構を有する。 a.単位ボリュームの容量をファイルシステムに通知する為のインターフェース また、項目2のファイルシステムは以下の機構を有する。 i.複数の単位ボリュームを、仮想ボリュームとして使用可能にするためのストレージ仮想化層 ii.単位ボリュームの容量をデータベースサーバに通知する為のインターフェース iii.仮想ボリュームの構成をデータベースサーバからの要請に応じて変更するためのインターフェース さらに、項目3のデータベース探索プログラムは以下の機構を有する。 A.データ移動の際に、移動の最小単位となるデータグループB.個々の単位ボリュームをファイルシステムに問い合わせる為のインターフェース C.単位ボリュームに重複なくデータグループを配置する機構D.仮想ボリュームの構成変更をファイルシステムに要請するためのインターフェース。

US8190610B2
CLAIM 41
. The computer system (行うこと) of claim 40 , wherein the reducing is accomplished by iterating on at least one key from each of the first and second intermediate data (ファイル) set so that the output data set is a merging of a portion of the first and second intermediate data set .
JP2005196602A
CLAIM 2
前記データのソート及びマージを専用に行うサーバの追加は、前記ネットワークに接続されていて前記無共有型データベース管理システムのCPUリソースとして機能していないプールサーバ上でデータのソート及びマージを行うためのプログラムを立ち上げることにより行うこと (computer system) を特徴とする請求項1のシステム構成変更方法。

JP2005196602A
CLAIM 7
複数のデータベースサーバからなる無共有型データベース管理システムにおいて、データベースサーバ間のデータ移動をファイル (second intermediate data) システムの仮想ボリュームの変更に置換するための装置で構成され、前記装置は以下の要素を含む。 1.ネットワーク上で共有されるストレージ 2.複数の単位ボリュームを、仮想ボリュームとして使用可能にするファイルシステム 3.データ移動をファイルシステムに委譲する機構を持ったデータベース探索プログラム また、項目1のストレージは以下の機構を有する。 a.単位ボリュームの容量をファイルシステムに通知する為のインターフェース また、項目2のファイルシステムは以下の機構を有する。 i.複数の単位ボリュームを、仮想ボリュームとして使用可能にするためのストレージ仮想化層 ii.単位ボリュームの容量をデータベースサーバに通知する為のインターフェース iii.仮想ボリュームの構成をデータベースサーバからの要請に応じて変更するためのインターフェース さらに、項目3のデータベース探索プログラムは以下の機構を有する。 A.データ移動の際に、移動の最小単位となるデータグループB.個々の単位ボリュームをファイルシステムに問い合わせる為のインターフェース C.単位ボリュームに重複なくデータグループを配置する機構D.仮想ボリュームの構成変更をファイルシステムに要請するためのインターフェース。

US8190610B2
CLAIM 42
. The computer system (行うこと) of claim 40 , wherein the reducing is accomplished by a plurality of reducers , the d the at least one processor and memory further operable for : partitioning the first or second intermediate data (ファイル) set into a plurality of partitions , at least some of the key-value pairs of the first or second intermediate data being provided to more than one of the partitions ;

and providing the partitioned intermediate data set of each partition to a separate one of the reducers .
JP2005196602A
CLAIM 2
前記データのソート及びマージを専用に行うサーバの追加は、前記ネットワークに接続されていて前記無共有型データベース管理システムのCPUリソースとして機能していないプールサーバ上でデータのソート及びマージを行うためのプログラムを立ち上げることにより行うこと (computer system) を特徴とする請求項1のシステム構成変更方法。

JP2005196602A
CLAIM 7
複数のデータベースサーバからなる無共有型データベース管理システムにおいて、データベースサーバ間のデータ移動をファイル (second intermediate data) システムの仮想ボリュームの変更に置換するための装置で構成され、前記装置は以下の要素を含む。 1.ネットワーク上で共有されるストレージ 2.複数の単位ボリュームを、仮想ボリュームとして使用可能にするファイルシステム 3.データ移動をファイルシステムに委譲する機構を持ったデータベース探索プログラム また、項目1のストレージは以下の機構を有する。 a.単位ボリュームの容量をファイルシステムに通知する為のインターフェース また、項目2のファイルシステムは以下の機構を有する。 i.複数の単位ボリュームを、仮想ボリュームとして使用可能にするためのストレージ仮想化層 ii.単位ボリュームの容量をデータベースサーバに通知する為のインターフェース iii.仮想ボリュームの構成をデータベースサーバからの要請に応じて変更するためのインターフェース さらに、項目3のデータベース探索プログラムは以下の機構を有する。 A.データ移動の際に、移動の最小単位となるデータグループB.個々の単位ボリュームをファイルシステムに問い合わせる為のインターフェース C.単位ボリュームに重複なくデータグループを配置する機構D.仮想ボリュームの構成変更をファイルシステムに要請するためのインターフェース。

US8190610B2
CLAIM 43
. The computer system (行うこと) of claim 42 , wherein all of the key-value pairs of the first and second intermediate data (ファイル) set are provided to all of the reducers .
JP2005196602A
CLAIM 2
前記データのソート及びマージを専用に行うサーバの追加は、前記ネットワークに接続されていて前記無共有型データベース管理システムのCPUリソースとして機能していないプールサーバ上でデータのソート及びマージを行うためのプログラムを立ち上げることにより行うこと (computer system) を特徴とする請求項1のシステム構成変更方法。

JP2005196602A
CLAIM 7
複数のデータベースサーバからなる無共有型データベース管理システムにおいて、データベースサーバ間のデータ移動をファイル (second intermediate data) システムの仮想ボリュームの変更に置換するための装置で構成され、前記装置は以下の要素を含む。 1.ネットワーク上で共有されるストレージ 2.複数の単位ボリュームを、仮想ボリュームとして使用可能にするファイルシステム 3.データ移動をファイルシステムに委譲する機構を持ったデータベース探索プログラム また、項目1のストレージは以下の機構を有する。 a.単位ボリュームの容量をファイルシステムに通知する為のインターフェース また、項目2のファイルシステムは以下の機構を有する。 i.複数の単位ボリュームを、仮想ボリュームとして使用可能にするためのストレージ仮想化層 ii.単位ボリュームの容量をデータベースサーバに通知する為のインターフェース iii.仮想ボリュームの構成をデータベースサーバからの要請に応じて変更するためのインターフェース さらに、項目3のデータベース探索プログラムは以下の機構を有する。 A.データ移動の際に、移動の最小単位となるデータグループB.個々の単位ボリュームをファイルシステムに問い合わせる為のインターフェース C.単位ボリュームに重複なくデータグループを配置する機構D.仮想ボリュームの構成変更をファイルシステムに要請するためのインターフェース。

US8190610B2
CLAIM 44
. The computer system (行うこと) of claim 42 , wherein at least some of the reducers include a sort , group-by-key and combine task , the d the at least one processor and memory are further operable for : generating and providing metadata for at least some of the mapping , partitioning , combining , grouping and sorting .
JP2005196602A
CLAIM 2
前記データのソート及びマージを専用に行うサーバの追加は、前記ネットワークに接続されていて前記無共有型データベース管理システムのCPUリソースとして機能していないプールサーバ上でデータのソート及びマージを行うためのプログラムを立ち上げることにより行うこと (computer system) を特徴とする請求項1のシステム構成変更方法。

US8190610B2
CLAIM 45
. The computer system (行うこと) of claim 44 , wherein the reducing includes processing the metadata .
JP2005196602A
CLAIM 2
前記データのソート及びマージを専用に行うサーバの追加は、前記ネットワークに接続されていて前記無共有型データベース管理システムのCPUリソースとして機能していないプールサーバ上でデータのソート及びマージを行うためのプログラムを立ち上げることにより行うこと (computer system) を特徴とする請求項1のシステム構成変更方法。

US8190610B2
CLAIM 46
. The computer system (行うこと) of claim 45 , wherein iterating includes providing the associated metadata to the processing of the reducing step .
JP2005196602A
CLAIM 2
前記データのソート及びマージを専用に行うサーバの追加は、前記ネットワークに接続されていて前記無共有型データベース管理システムのCPUリソースとして機能していないプールサーバ上でデータのソート及びマージを行うためのプログラムを立ち上げることにより行うこと (computer system) を特徴とする請求項1のシステム構成変更方法。




US8190610B2

Filed: 2006-10-05     Issued: 2012-05-29

MapReduce for distributed database processing

(Original Assignee) Yahoo Inc     (Current Assignee) R2 Solutions LLC ; Altaba Inc

Ali Dasdan, Hung-Chih Yang, Ruey-Lung Hsiao
US20040181461A1

Filed: 2003-12-23     Issued: 2004-09-16

Multi-modal sales applications

(Original Assignee) SAP SE     (Current Assignee) SAP SE

Samir Raiyani, Jie Weng, Li Gong, Jordan Anderson, Wai Or, Ju-Kay Kwek, John Hanley
US8190610B2
CLAIM 1
. A method of processing data of a data set over a distributed system , wherein the data set comprises a plurality of data groups , the method comprising : partitioning the data of each one of the data groups into a plurality of data partitions that each have a plurality of key-value pairs and providing each data partition to a selected one of a plurality of mapping functions that are each user-configurable to independently output a plurality of lists of values for each of a set of keys found in such map function' ;

s corresponding data partition to form corresponding intermediate data for that data group and identifiable to that data group , wherein the data of a first data group (additional input) has a different schema (Extensible Markup Language, receiving input) than the data of a second data group and the data of the first data group is mapped differently than the data of the second data group so that different lists of values are output for the corresponding different intermediate data , wherein the different schema and corresponding different intermediate data have a key in common ;

and reducing the intermediate data for the data groups to at least one output data group , including processing the intermediate data for each data group in a manner that is defined to correspond to that data group , so as to result in a merging of the corresponding different intermediate data based on the key in common , wherein the mapping and reducing operations are performed by a distributed system .
US20040181461A1
CLAIM 10
. The system of claim 1 wherein the selected input modality is associated with voice input , and the first and second pages are associated with Voice Extensible Markup Language (different schema) (VXML) .

US20040181461A1
CLAIM 22
. A method of providing product data , the method comprising : receiving at an electronic device an identifying input in a first modality , the identifying input identifying a product ;
requesting automatically , after receiving the identifying input , product information ;
providing , from the electronic device , the product information to a user ;
receiving at the electronic device additional input (first data group) in a second modality , the additional input requesting additional product information ;
requesting automatically , after receiving the additional input , the additional product information ;
and providing , from the electronic device , the additional product information to the user .

US20040181461A1
CLAIM 28
. The method of claim 22 wherein receiving input (different schema) in at least one of the first and second modalities comprises receiving a search string and the method further comprises : accessing at least a first part of the search string ;
searching a first search space for a match for the first part of the search string ;
limiting a second search space based on a result of searching the first search space ;
accessing at least a second part of the search string ;
and searching the limited second search space for a match for the second part of the search string .

US8190610B2
CLAIM 12
. The method of claim 5 , wherein : the intermediate data includes a plurality of grouped sets of key/value pairs ;

the reducing step is carried out by a plurality of reducers ;

for at least one of the reducers , the iterator corresponding to a particular data group , for that reducer , operates according to a different key of a different schema (Extensible Markup Language, receiving input) than the iterator corresponding to another particular data group , for that reducer .
US20040181461A1
CLAIM 10
. The system of claim 1 wherein the selected input modality is associated with voice input , and the first and second pages are associated with Voice Extensible Markup Language (different schema) (VXML) .

US20040181461A1
CLAIM 28
. The method of claim 22 wherein receiving input (different schema) in at least one of the first and second modalities comprises receiving a search string and the method further comprises : accessing at least a first part of the search string ;
searching a first search space for a match for the first part of the search string ;
limiting a second search space based on a result of searching the first search space ;
accessing at least a second part of the search string ;
and searching the limited second search space for a match for the second part of the search string .

US8190610B2
CLAIM 17
. A computer system (stored data) including a plurality of computing devices , the computer system configured to process data of a data set , wherein the data set comprises a plurality of data groups , the computer system comprises at least one processor and memory that are operable to perform the following operations : partitioning the data of each one of the data groups into a plurality of data partitions that each have a plurality of key-value pairs and providing each data partition to a selected one of a plurality of mapping functions that are each user-configurable to independently output a plurality of lists of values for each of a set of keys found in such map function' ;

s corresponding data partition to form corresponding intermediate data for that data group and identifiable to that data group , wherein the data of a first data group (additional input) has a different schema (Extensible Markup Language, receiving input) than the data of a second data group and the data of the first data group is mapped differently than the data of the second data group so that different lists of values are output for the corresponding different intermediate data , wherein the different schema and corresponding different intermediate data have a key in common ;

and reduce the intermediate data for the data groups to at least one output data group , including processing the intermediate data for each data group in a manner that is defined to correspond to that data group so as to result in a merging of the corresponding different intermediate data based on the key in common .
US20040181461A1
CLAIM 10
. The system of claim 1 wherein the selected input modality is associated with voice input , and the first and second pages are associated with Voice Extensible Markup Language (different schema) (VXML) .

US20040181461A1
CLAIM 22
. A method of providing product data , the method comprising : receiving at an electronic device an identifying input in a first modality , the identifying input identifying a product ;
requesting automatically , after receiving the identifying input , product information ;
providing , from the electronic device , the product information to a user ;
receiving at the electronic device additional input (first data group) in a second modality , the additional input requesting additional product information ;
requesting automatically , after receiving the additional input , the additional product information ;
and providing , from the electronic device , the additional product information to the user .

US20040181461A1
CLAIM 28
. The method of claim 22 wherein receiving input (different schema) in at least one of the first and second modalities comprises receiving a search string and the method further comprises : accessing at least a first part of the search string ;
searching a first search space for a match for the first part of the search string ;
limiting a second search space based on a result of searching the first search space ;
accessing at least a second part of the search string ;
and searching the limited second search space for a match for the second part of the search string .

US20040181461A1
CLAIM 31
. The method of claim 22 wherein receiving input in at least one of the first and second modalities comprises : activating a first grammar from among a plurality of independent grammars , the first grammar being identified with a first input category ;
deactivating at least a second grammar from among the plurality of independent grammars ;
inputting spoken data related to the first input category ;
and matching the spoken data to stored data (computer system) within the first grammar .

US8190610B2
CLAIM 18
. The computer system (stored data) of claim 17 , wherein : the at least one output data group is a plurality of output data groups .
US20040181461A1
CLAIM 31
. The method of claim 22 wherein receiving input in at least one of the first and second modalities comprises : activating a first grammar from among a plurality of independent grammars , the first grammar being identified with a first input category ;
deactivating at least a second grammar from among the plurality of independent grammars ;
inputting spoken data related to the first input category ;
and matching the spoken data to stored data (computer system) within the first grammar .

US8190610B2
CLAIM 19
. The computer system (stored data) of claim 17 , wherein : corresponding intermediate data for a data group being identifiable to the data group includes the corresponding intermediate data being stored such that metadata associated with the manner of storage is identifiable to the data group .
US20040181461A1
CLAIM 31
. The method of claim 22 wherein receiving input in at least one of the first and second modalities comprises : activating a first grammar from among a plurality of independent grammars , the first grammar being identified with a first input category ;
deactivating at least a second grammar from among the plurality of independent grammars ;
inputting spoken data related to the first input category ;
and matching the spoken data to stored data (computer system) within the first grammar .

US8190610B2
CLAIM 20
. The computer system (stored data) of claim 19 , wherein : corresponding intermediate data for a data group being identifiable to the data group includes the computer system being configured to store the corresponding intermediate data in association with data that is identifiable to the data group .
US20040181461A1
CLAIM 31
. The method of claim 22 wherein receiving input in at least one of the first and second modalities comprises : activating a first grammar from among a plurality of independent grammars , the first grammar being identified with a first input category ;
deactivating at least a second grammar from among the plurality of independent grammars ;
inputting spoken data related to the first input category ;
and matching the spoken data to stored data (computer system) within the first grammar .

US8190610B2
CLAIM 21
. The computer system (stored data) of claim 17 , wherein : the at least one processor and memory being configured to process the intermediate data for each data group in a manner that is defined to correspond to that data group includes , for each data group , the at least one processor and memory being configured to employ an iterator that corresponds to that data group .
US20040181461A1
CLAIM 31
. The method of claim 22 wherein receiving input in at least one of the first and second modalities comprises : activating a first grammar from among a plurality of independent grammars , the first grammar being identified with a first input category ;
deactivating at least a second grammar from among the plurality of independent grammars ;
inputting spoken data related to the first input category ;
and matching the spoken data to stored data (computer system) within the first grammar .

US8190610B2
CLAIM 22
. The computer system (stored data) of claim 21 , wherein : the intermediate data includes a plurality of grouped sets of key/value pairs ;

the at least one processor and memory are further operable to reduce the intermediate data via a plurality of reducers ;

for at least one of the reducers , the iterator corresponding to a particular data group , for that reducer , is configured to operate according to a different key of a different schema (Extensible Markup Language, receiving input) than the iterator corresponding to another particular data group , for that reducer .
US20040181461A1
CLAIM 10
. The system of claim 1 wherein the selected input modality is associated with voice input , and the first and second pages are associated with Voice Extensible Markup Language (different schema) (VXML) .

US20040181461A1
CLAIM 28
. The method of claim 22 wherein receiving input (different schema) in at least one of the first and second modalities comprises receiving a search string and the method further comprises : accessing at least a first part of the search string ;
searching a first search space for a match for the first part of the search string ;
limiting a second search space based on a result of searching the first search space ;
accessing at least a second part of the search string ;
and searching the limited second search space for a match for the second part of the search string .

US20040181461A1
CLAIM 31
. The method of claim 22 wherein receiving input in at least one of the first and second modalities comprises : activating a first grammar from among a plurality of independent grammars , the first grammar being identified with a first input category ;
deactivating at least a second grammar from among the plurality of independent grammars ;
inputting spoken data related to the first input category ;
and matching the spoken data to stored data (computer system) within the first grammar .

US8190610B2
CLAIM 23
. The computer system (stored data) of claim 17 , wherein : the intermediate data includes a plurality of grouped sets of key/value pairs ;

the at least one processor and memory are further operable to reduce the intermediate data via a plurality of reducers ;

and the at least one processor and memory are further operable to : partition the intermediate data into a plurality of partitions , to provide each key/value pair of the intermediate data to a separate one of the partitions ;

and provide the intermediate data of each partition to a separate one of the reducers .
US20040181461A1
CLAIM 31
. The method of claim 22 wherein receiving input in at least one of the first and second modalities comprises : activating a first grammar from among a plurality of independent grammars , the first grammar being identified with a first input category ;
deactivating at least a second grammar from among the plurality of independent grammars ;
inputting spoken data related to the first input category ;
and matching the spoken data to stored data (computer system) within the first grammar .

US8190610B2
CLAIM 24
. The computer system (stored data) of claim 17 , wherein : the intermediate data includes a plurality of grouped sets of key/value pairs ;

the at least one processor and memory are further operable to reduce the intermediate data via a plurality of reducers ;

and the at least one processor and memory are further operable to : partition the intermediate data into a plurality of partitions , to provide at least some of the key/value pairs of the intermediate data to more than one of the partitions ;

and provide the intermediate data of each partition to a separate one of the reducers .
US20040181461A1
CLAIM 31
. The method of claim 22 wherein receiving input in at least one of the first and second modalities comprises : activating a first grammar from among a plurality of independent grammars , the first grammar being identified with a first input category ;
deactivating at least a second grammar from among the plurality of independent grammars ;
inputting spoken data related to the first input category ;
and matching the spoken data to stored data (computer system) within the first grammar .

US8190610B2
CLAIM 25
. The computer system (stored data) of claim 24 , wherein : the at least one processor and memory are further operable to , in partitioning , to provide all of the key/value pairs of the intermediate data to all of the partitions .
US20040181461A1
CLAIM 31
. The method of claim 22 wherein receiving input in at least one of the first and second modalities comprises : activating a first grammar from among a plurality of independent grammars , the first grammar being identified with a first input category ;
deactivating at least a second grammar from among the plurality of independent grammars ;
inputting spoken data related to the first input category ;
and matching the spoken data to stored data (computer system) within the first grammar .

US8190610B2
CLAIM 26
. The computer system (stored data) of claim 24 , wherein : at least some of the reducers include a sort , group-by-key and combine task ;

the at least one processor and memory are further operable to generate and provide metadata for at least some of the mapping , partitioning , combining , grouping and sorting .
US20040181461A1
CLAIM 31
. The method of claim 22 wherein receiving input in at least one of the first and second modalities comprises : activating a first grammar from among a plurality of independent grammars , the first grammar being identified with a first input category ;
deactivating at least a second grammar from among the plurality of independent grammars ;
inputting spoken data related to the first input category ;
and matching the spoken data to stored data (computer system) within the first grammar .

US8190610B2
CLAIM 27
. The computer system (stored data) of claim 26 , wherein : the reducing includes processing the metadata .
US20040181461A1
CLAIM 31
. The method of claim 22 wherein receiving input in at least one of the first and second modalities comprises : activating a first grammar from among a plurality of independent grammars , the first grammar being identified with a first input category ;
deactivating at least a second grammar from among the plurality of independent grammars ;
inputting spoken data related to the first input category ;
and matching the spoken data to stored data (computer system) within the first grammar .

US8190610B2
CLAIM 28
. The computer system (stored data) of claim 27 , wherein : the at least one processor and memory being further operable to process the intermediate data for each data group in a manner that is defined to correspond to that data group includes , for each data group , the at least one processor and memory being further operable to employ an iterator that corresponds to that data group , wherein the iterator is configured to provide the associated metadata to the reducing .
US20040181461A1
CLAIM 31
. The method of claim 22 wherein receiving input in at least one of the first and second modalities comprises : activating a first grammar from among a plurality of independent grammars , the first grammar being identified with a first input category ;
deactivating at least a second grammar from among the plurality of independent grammars ;
inputting spoken data related to the first input category ;
and matching the spoken data to stored data (computer system) within the first grammar .

US8190610B2
CLAIM 29
. The computer system (stored data) of claim 17 , wherein : the intermediate data processing of the reducing further comprises processing data that is not intermediate data .
US20040181461A1
CLAIM 31
. The method of claim 22 wherein receiving input in at least one of the first and second modalities comprises : activating a first grammar from among a plurality of independent grammars , the first grammar being identified with a first input category ;
deactivating at least a second grammar from among the plurality of independent grammars ;
inputting spoken data related to the first input category ;
and matching the spoken data to stored data (computer system) within the first grammar .

US8190610B2
CLAIM 30
. The computer system (stored data) of claim 29 , wherein : the at least one processor and memory are further operable to reduce the intermediate data via a plurality of reducers ;

and the data that is not intermediate data , for a particular reducer , includes data that is associated with another reducer .
US20040181461A1
CLAIM 31
. The method of claim 22 wherein receiving input in at least one of the first and second modalities comprises : activating a first grammar from among a plurality of independent grammars , the first grammar being identified with a first input category ;
deactivating at least a second grammar from among the plurality of independent grammars ;
inputting spoken data related to the first input category ;
and matching the spoken data to stored data (computer system) within the first grammar .

US8190610B2
CLAIM 31
. The computer system (stored data) of claim 29 , wherein : the at least one processor and memory are further operable to reduce the intermediate data via a plurality of reducers ;

and the data that is not intermediate data , for a particular reducer , includes data that is associated with that reducer .
US20040181461A1
CLAIM 31
. The method of claim 22 wherein receiving input in at least one of the first and second modalities comprises : activating a first grammar from among a plurality of independent grammars , the first grammar being identified with a first input category ;
deactivating at least a second grammar from among the plurality of independent grammars ;
inputting spoken data related to the first input category ;
and matching the spoken data to stored data (computer system) within the first grammar .

US8190610B2
CLAIM 32
. The computer system (stored data) of claim 17 , wherein the at least one processor and memory are further operable to reduce by relating the data among the plurality of data groups .
US20040181461A1
CLAIM 31
. The method of claim 22 wherein receiving input in at least one of the first and second modalities comprises : activating a first grammar from among a plurality of independent grammars , the first grammar being identified with a first input category ;
deactivating at least a second grammar from among the plurality of independent grammars ;
inputting spoken data related to the first input category ;
and matching the spoken data to stored data (computer system) within the first grammar .

US8190610B2
CLAIM 33
. A map-reduce method of processing data from a plurality of groups having different schema (Extensible Markup Language, receiving input) over a computer system (stored data) , the method comprising : for a first data set having a plurality of first key-value pairs , wherein such first data set belongs to a first data group (additional input) and the first key-value pairs have a first schema , partitioning the first data set into a plurality of data partitions and providing each data partition to a selected one of a plurality of mapping functions that are each user-configurable to independently output a plurality of lists of values for each of a set of keys found in such map function' ;

s corresponding data partition to form a first intermediate data set having a first set (first set) of resulting key-value pairs ;

for a second data set having a plurality of second key-value pairs , wherein such second data set belongs to a second data group and the second key-value pairs have a second schema , partitioning the first data set into a plurality of data partitions and providing each data partition to a selected one of a plurality of mapping functions to form a second intermediate data set having a second set (second set) of resulting key-value pairs differing from the first set of resulting key-value pairs , wherein the first schema differs from the second schema , wherein the first and second schema and the first and second set of resulting key-value pairs have a key in common ;

and reducing the first and the second intermediate data set together so as to form an output data set , wherein the reducing is accomplished by iterating on at least one key from the first and second intermediate data set , and the output data set has a different schema than the first and second schema , wherein the mapping and reducing operations are performed by a distributed system .
US20040181461A1
CLAIM 10
. The system of claim 1 wherein the selected input modality is associated with voice input , and the first and second pages are associated with Voice Extensible Markup Language (different schema) (VXML) .

US20040181461A1
CLAIM 22
. A method of providing product data , the method comprising : receiving at an electronic device an identifying input in a first modality , the identifying input identifying a product ;
requesting automatically , after receiving the identifying input , product information ;
providing , from the electronic device , the product information to a user ;
receiving at the electronic device additional input (first data group) in a second modality , the additional input requesting additional product information ;
requesting automatically , after receiving the additional input , the additional product information ;
and providing , from the electronic device , the additional product information to the user .

US20040181461A1
CLAIM 28
. The method of claim 22 wherein receiving input (different schema) in at least one of the first and second modalities comprises receiving a search string and the method further comprises : accessing at least a first part of the search string ;
searching a first search space for a match for the first part of the search string ;
limiting a second search space based on a result of searching the first search space ;
accessing at least a second part of the search string ;
and searching the limited second search space for a match for the second part of the search string .

US20040181461A1
CLAIM 30
. The method of claim 22 wherein receiving input in at least one of the first and second modalities comprises : providing a first set (first set) of options to the user , the first set of options relating to a first parameter of a search string , and being provided to the user in a page ;
accepting a first input from the user , the first input being selected from the first set of options ;
limiting a second set (second set) of options based on the accepted first input , the second set of options relating to a second parameter of the search string ;
and providing the second set of options to the user in the page , such that the user is presented with a single page that provides the first set of options and the second set of options .

US20040181461A1
CLAIM 31
. The method of claim 22 wherein receiving input in at least one of the first and second modalities comprises : activating a first grammar from among a plurality of independent grammars , the first grammar being identified with a first input category ;
deactivating at least a second grammar from among the plurality of independent grammars ;
inputting spoken data related to the first input category ;
and matching the spoken data to stored data (computer system) within the first grammar .

US8190610B2
CLAIM 40
. A computer system (stored data) including a plurality of computing devices , the computer system configured to process data of a data set , wherein the data set comprises a plurality of data groups , the computer system comprises at least one processor and memory that are operable to perform the following operations : for a first data set having a plurality of first key-value pairs , wherein such first data set belongs to a first data group (additional input) and the first key-value pairs have a first schema , partitioning the first data set into a plurality of data partitions and providing each data partition to a selected one of a plurality of mapping functions that are each user-configurable to independently output a plurality of lists of values for each of a set of keys found in such map function' ;

s corresponding data partition to form a first intermediate data set having a first set (first set) of resulting key-value pairs ;

for a second data set having a plurality of second key-value pairs , wherein such second data set belongs to a second data group and the second key-value pairs have a second schema , partitioning the first data set into a plurality of data partitions and providing each data partition to a selected one of a plurality of mapping functions to form a second intermediate data set having a second set (second set) of resulting key-value pairs differing from the first set of resulting key-value pairs , wherein the first schema differs from the second schema ;

and reducing the first and the second intermediate data set together so as to form an output data set , wherein the reducing is accomplished by iterating on at least one key from the first and second intermediate data set , and the output data set has a different schema (Extensible Markup Language, receiving input) than the first and second schema , wherein the first and second schema and the first and second set of resulting key-value pairs have a key in common .
US20040181461A1
CLAIM 10
. The system of claim 1 wherein the selected input modality is associated with voice input , and the first and second pages are associated with Voice Extensible Markup Language (different schema) (VXML) .

US20040181461A1
CLAIM 22
. A method of providing product data , the method comprising : receiving at an electronic device an identifying input in a first modality , the identifying input identifying a product ;
requesting automatically , after receiving the identifying input , product information ;
providing , from the electronic device , the product information to a user ;
receiving at the electronic device additional input (first data group) in a second modality , the additional input requesting additional product information ;
requesting automatically , after receiving the additional input , the additional product information ;
and providing , from the electronic device , the additional product information to the user .

US20040181461A1
CLAIM 28
. The method of claim 22 wherein receiving input (different schema) in at least one of the first and second modalities comprises receiving a search string and the method further comprises : accessing at least a first part of the search string ;
searching a first search space for a match for the first part of the search string ;
limiting a second search space based on a result of searching the first search space ;
accessing at least a second part of the search string ;
and searching the limited second search space for a match for the second part of the search string .

US20040181461A1
CLAIM 30
. The method of claim 22 wherein receiving input in at least one of the first and second modalities comprises : providing a first set (first set) of options to the user , the first set of options relating to a first parameter of a search string , and being provided to the user in a page ;
accepting a first input from the user , the first input being selected from the first set of options ;
limiting a second set (second set) of options based on the accepted first input , the second set of options relating to a second parameter of the search string ;
and providing the second set of options to the user in the page , such that the user is presented with a single page that provides the first set of options and the second set of options .

US20040181461A1
CLAIM 31
. The method of claim 22 wherein receiving input in at least one of the first and second modalities comprises : activating a first grammar from among a plurality of independent grammars , the first grammar being identified with a first input category ;
deactivating at least a second grammar from among the plurality of independent grammars ;
inputting spoken data related to the first input category ;
and matching the spoken data to stored data (computer system) within the first grammar .

US8190610B2
CLAIM 41
. The computer system (stored data) of claim 40 , wherein the reducing is accomplished by iterating on at least one key from each of the first and second intermediate data set so that the output data set is a merging of a portion of the first and second intermediate data set .
US20040181461A1
CLAIM 31
. The method of claim 22 wherein receiving input in at least one of the first and second modalities comprises : activating a first grammar from among a plurality of independent grammars , the first grammar being identified with a first input category ;
deactivating at least a second grammar from among the plurality of independent grammars ;
inputting spoken data related to the first input category ;
and matching the spoken data to stored data (computer system) within the first grammar .

US8190610B2
CLAIM 42
. The computer system (stored data) of claim 40 , wherein the reducing is accomplished by a plurality of reducers , the d the at least one processor and memory further operable for : partitioning the first or second intermediate data set into a plurality of partitions , at least some of the key-value pairs of the first or second intermediate data being provided to more than one of the partitions ;

and providing the partitioned intermediate data set of each partition to a separate one of the reducers .
US20040181461A1
CLAIM 31
. The method of claim 22 wherein receiving input in at least one of the first and second modalities comprises : activating a first grammar from among a plurality of independent grammars , the first grammar being identified with a first input category ;
deactivating at least a second grammar from among the plurality of independent grammars ;
inputting spoken data related to the first input category ;
and matching the spoken data to stored data (computer system) within the first grammar .

US8190610B2
CLAIM 43
. The computer system (stored data) of claim 42 , wherein all of the key-value pairs of the first and second intermediate data set are provided to all of the reducers .
US20040181461A1
CLAIM 31
. The method of claim 22 wherein receiving input in at least one of the first and second modalities comprises : activating a first grammar from among a plurality of independent grammars , the first grammar being identified with a first input category ;
deactivating at least a second grammar from among the plurality of independent grammars ;
inputting spoken data related to the first input category ;
and matching the spoken data to stored data (computer system) within the first grammar .

US8190610B2
CLAIM 44
. The computer system (stored data) of claim 42 , wherein at least some of the reducers include a sort , group-by-key and combine task , the d the at least one processor and memory are further operable for : generating and providing metadata for at least some of the mapping , partitioning , combining , grouping and sorting .
US20040181461A1
CLAIM 31
. The method of claim 22 wherein receiving input in at least one of the first and second modalities comprises : activating a first grammar from among a plurality of independent grammars , the first grammar being identified with a first input category ;
deactivating at least a second grammar from among the plurality of independent grammars ;
inputting spoken data related to the first input category ;
and matching the spoken data to stored data (computer system) within the first grammar .

US8190610B2
CLAIM 45
. The computer system (stored data) of claim 44 , wherein the reducing includes processing the metadata .
US20040181461A1
CLAIM 31
. The method of claim 22 wherein receiving input in at least one of the first and second modalities comprises : activating a first grammar from among a plurality of independent grammars , the first grammar being identified with a first input category ;
deactivating at least a second grammar from among the plurality of independent grammars ;
inputting spoken data related to the first input category ;
and matching the spoken data to stored data (computer system) within the first grammar .

US8190610B2
CLAIM 46
. The computer system (stored data) of claim 45 , wherein iterating includes providing the associated metadata to the processing of the reducing step .
US20040181461A1
CLAIM 31
. The method of claim 22 wherein receiving input in at least one of the first and second modalities comprises : activating a first grammar from among a plurality of independent grammars , the first grammar being identified with a first input category ;
deactivating at least a second grammar from among the plurality of independent grammars ;
inputting spoken data related to the first input category ;
and matching the spoken data to stored data (computer system) within the first grammar .




US8190610B2

Filed: 2006-10-05     Issued: 2012-05-29

MapReduce for distributed database processing

(Original Assignee) Yahoo Inc     (Current Assignee) R2 Solutions LLC ; Altaba Inc

Ali Dasdan, Hung-Chih Yang, Ruey-Lung Hsiao
JP2005135317A

Filed: 2003-10-31     Issued: 2005-05-26

文書管理システムおよび文書管理プログラム

(Original Assignee) Toshiba Solutions Corp; 東芝ソリューション株式会社     

Katsuhiko Takachio, Yasunori Yokoyama, 康則 横山, 勝彦 高知尾
US8190610B2
CLAIM 1
. A method of processing data (コンピュ) of a data set over a distributed system , wherein the data set comprises a plurality of data groups , the method comprising : partitioning the data of each one of the data groups into a plurality of data partitions that each have a plurality of key-value pairs and providing each data partition to a selected one (スタイル) of a plurality of mapping functions that are each user-configurable to independently output a plurality of lists of values for each of a set of keys found in such map function' ;

s corresponding data partition to form corresponding intermediate data for that data group and identifiable to that data group , wherein the data of a first data group has a different schema than the data of a second data group and the data of the first data group is mapped differently than the data of the second data group so that different lists of values are output for the corresponding different intermediate data , wherein the different schema and corresponding different intermediate data have a key in common ;

and reducing the intermediate data for the data groups to at least one output data group , including processing the intermediate data for each data group in a manner that is defined to correspond to that data group , so as to result in a merging of the corresponding different intermediate data based on the key in common , wherein the mapping and reducing operations are performed by a distributed system .
JP2005135317A
CLAIM 3
前記入力手段は、XML文書内の各部分を特定する規則が定義されたスタイル (selected one) シートを登録対象のXML文書と共に入力し、 前記分割手段は、前記入力手段により入力されたスタイルシートに基づき、前記入力手段により入力されたXML文書を部分単位に分割することを特徴とする請求項1記載の文書管理システム。

JP2005135317A
CLAIM 5
自然語で記述された検索条件文をもとにテキスト検索を行う自然語検索システムと、所定の規則に基づいて記述された検索条件式をもとにXML文書検索を行うXMLデータベースとを用いて、1つのXML文書に含まれる複数の要素部分それぞれを自然語検索対象の1単位として管理するための文書管理プログラムであって、 登録対象のXML文書を入力する入力手段、 前記入力手段により入力されたXML文書を部分単位に分割する分割手段、 各XML文書を一意に識別するための第1の識別子を前記入力手段により入力されたXML文書に付与すると共に、各部分を当該XML文書内で一意に識別するための第2の識別子を前記分割手段により分割された部分それぞれに付与する識別子付与手段、 前記分割手段により分割された各部分のテキスト群を、前記識別子付与手段により付与された当該部分に対応する第1および第2の識別子の組みを属性情報として設定して前記自然語検索システムに登録すると共に、この第1および第2の識別子の組みを各々の該当箇所に挿入した当該XML文書を前記XMLデータベースに登録する登録手段 としてコンピュ (processing data) ータを機能させる文書管理プログラム。

US8190610B2
CLAIM 13
. The method of claim 1 , wherein : the intermediate data processing step of the reducing step further comprises processing data (コンピュ) that is not intermediate data .
JP2005135317A
CLAIM 5
自然語で記述された検索条件文をもとにテキスト検索を行う自然語検索システムと、所定の規則に基づいて記述された検索条件式をもとにXML文書検索を行うXMLデータベースとを用いて、1つのXML文書に含まれる複数の要素部分それぞれを自然語検索対象の1単位として管理するための文書管理プログラムであって、 登録対象のXML文書を入力する入力手段、 前記入力手段により入力されたXML文書を部分単位に分割する分割手段、 各XML文書を一意に識別するための第1の識別子を前記入力手段により入力されたXML文書に付与すると共に、各部分を当該XML文書内で一意に識別するための第2の識別子を前記分割手段により分割された部分それぞれに付与する識別子付与手段、 前記分割手段により分割された各部分のテキスト群を、前記識別子付与手段により付与された当該部分に対応する第1および第2の識別子の組みを属性情報として設定して前記自然語検索システムに登録すると共に、この第1および第2の識別子の組みを各々の該当箇所に挿入した当該XML文書を前記XMLデータベースに登録する登録手段 としてコンピュ (processing data) ータを機能させる文書管理プログラム。

US8190610B2
CLAIM 17
. A computer system including a plurality of computing devices , the computer system configured to process data of a data set , wherein the data set comprises a plurality of data groups , the computer system comprises at least one processor and memory that are operable to perform the following operations : partitioning the data of each one of the data groups into a plurality of data partitions that each have a plurality of key-value pairs and providing each data partition to a selected one (スタイル) of a plurality of mapping functions that are each user-configurable to independently output a plurality of lists of values for each of a set of keys found in such map function' ;

s corresponding data partition to form corresponding intermediate data for that data group and identifiable to that data group , wherein the data of a first data group has a different schema than the data of a second data group and the data of the first data group is mapped differently than the data of the second data group so that different lists of values are output for the corresponding different intermediate data , wherein the different schema and corresponding different intermediate data have a key in common ;

and reduce the intermediate data for the data groups to at least one output data group , including processing the intermediate data for each data group in a manner that is defined to correspond to that data group so as to result in a merging of the corresponding different intermediate data based on the key in common .
JP2005135317A
CLAIM 3
前記入力手段は、XML文書内の各部分を特定する規則が定義されたスタイル (selected one) シートを登録対象のXML文書と共に入力し、 前記分割手段は、前記入力手段により入力されたスタイルシートに基づき、前記入力手段により入力されたXML文書を部分単位に分割することを特徴とする請求項1記載の文書管理システム。

US8190610B2
CLAIM 29
. The computer system of claim 17 , wherein : the intermediate data processing of the reducing further comprises processing data (コンピュ) that is not intermediate data .
JP2005135317A
CLAIM 5
自然語で記述された検索条件文をもとにテキスト検索を行う自然語検索システムと、所定の規則に基づいて記述された検索条件式をもとにXML文書検索を行うXMLデータベースとを用いて、1つのXML文書に含まれる複数の要素部分それぞれを自然語検索対象の1単位として管理するための文書管理プログラムであって、 登録対象のXML文書を入力する入力手段、 前記入力手段により入力されたXML文書を部分単位に分割する分割手段、 各XML文書を一意に識別するための第1の識別子を前記入力手段により入力されたXML文書に付与すると共に、各部分を当該XML文書内で一意に識別するための第2の識別子を前記分割手段により分割された部分それぞれに付与する識別子付与手段、 前記分割手段により分割された各部分のテキスト群を、前記識別子付与手段により付与された当該部分に対応する第1および第2の識別子の組みを属性情報として設定して前記自然語検索システムに登録すると共に、この第1および第2の識別子の組みを各々の該当箇所に挿入した当該XML文書を前記XMLデータベースに登録する登録手段 としてコンピュ (processing data) ータを機能させる文書管理プログラム。

US8190610B2
CLAIM 33
. A map-reduce method of processing data (コンピュ) from a plurality of groups having different schema over a computer system , the method comprising : for a first data set having a plurality of first key-value pairs , wherein such first data set belongs to a first data group and the first key-value pairs have a first schema , partitioning the first data set into a plurality of data partitions and providing each data partition to a selected one (スタイル) of a plurality of mapping functions that are each user-configurable to independently output a plurality of lists of values for each of a set of keys found in such map function' ;

s corresponding data partition to form a first intermediate data set having a first set of resulting key-value pairs ;

for a second data set having a plurality of second key-value pairs , wherein such second data set belongs to a second data group and the second key-value pairs have a second schema , partitioning the first data set into a plurality of data partitions and providing each data partition to a selected one of a plurality of mapping functions to form a second intermediate data set having a second set of resulting key-value pairs differing from the first set of resulting key-value pairs , wherein the first schema differs from the second schema , wherein the first and second schema and the first and second set of resulting key-value pairs have a key in common ;

and reducing the first and the second intermediate data set together so as to form an output data set , wherein the reducing is accomplished by iterating on at least one key from the first and second intermediate data set , and the output data set has a different schema than the first and second schema , wherein the mapping and reducing operations are performed by a distributed system .
JP2005135317A
CLAIM 3
前記入力手段は、XML文書内の各部分を特定する規則が定義されたスタイル (selected one) シートを登録対象のXML文書と共に入力し、 前記分割手段は、前記入力手段により入力されたスタイルシートに基づき、前記入力手段により入力されたXML文書を部分単位に分割することを特徴とする請求項1記載の文書管理システム。

JP2005135317A
CLAIM 5
自然語で記述された検索条件文をもとにテキスト検索を行う自然語検索システムと、所定の規則に基づいて記述された検索条件式をもとにXML文書検索を行うXMLデータベースとを用いて、1つのXML文書に含まれる複数の要素部分それぞれを自然語検索対象の1単位として管理するための文書管理プログラムであって、 登録対象のXML文書を入力する入力手段、 前記入力手段により入力されたXML文書を部分単位に分割する分割手段、 各XML文書を一意に識別するための第1の識別子を前記入力手段により入力されたXML文書に付与すると共に、各部分を当該XML文書内で一意に識別するための第2の識別子を前記分割手段により分割された部分それぞれに付与する識別子付与手段、 前記分割手段により分割された各部分のテキスト群を、前記識別子付与手段により付与された当該部分に対応する第1および第2の識別子の組みを属性情報として設定して前記自然語検索システムに登録すると共に、この第1および第2の識別子の組みを各々の該当箇所に挿入した当該XML文書を前記XMLデータベースに登録する登録手段 としてコンピュ (processing data) ータを機能させる文書管理プログラム。

US8190610B2
CLAIM 40
. A computer system including a plurality of computing devices , the computer system configured to process data of a data set , wherein the data set comprises a plurality of data groups , the computer system comprises at least one processor and memory that are operable to perform the following operations : for a first data set having a plurality of first key-value pairs , wherein such first data set belongs to a first data group and the first key-value pairs have a first schema , partitioning the first data set into a plurality of data partitions and providing each data partition to a selected one (スタイル) of a plurality of mapping functions that are each user-configurable to independently output a plurality of lists of values for each of a set of keys found in such map function' ;

s corresponding data partition to form a first intermediate data set having a first set of resulting key-value pairs ;

for a second data set having a plurality of second key-value pairs , wherein such second data set belongs to a second data group and the second key-value pairs have a second schema , partitioning the first data set into a plurality of data partitions and providing each data partition to a selected one of a plurality of mapping functions to form a second intermediate data set having a second set of resulting key-value pairs differing from the first set of resulting key-value pairs , wherein the first schema differs from the second schema ;

and reducing the first and the second intermediate data set together so as to form an output data set , wherein the reducing is accomplished by iterating on at least one key from the first and second intermediate data set , and the output data set has a different schema than the first and second schema , wherein the first and second schema and the first and second set of resulting key-value pairs have a key in common .
JP2005135317A
CLAIM 3
前記入力手段は、XML文書内の各部分を特定する規則が定義されたスタイル (selected one) シートを登録対象のXML文書と共に入力し、 前記分割手段は、前記入力手段により入力されたスタイルシートに基づき、前記入力手段により入力されたXML文書を部分単位に分割することを特徴とする請求項1記載の文書管理システム。




US8190610B2

Filed: 2006-10-05     Issued: 2012-05-29

MapReduce for distributed database processing

(Original Assignee) Yahoo Inc     (Current Assignee) R2 Solutions LLC ; Altaba Inc

Ali Dasdan, Hung-Chih Yang, Ruey-Lung Hsiao
US20040111410A1

Filed: 2003-10-14     Issued: 2004-06-10

Information reservoir

(Original Assignee) Battelle Memorial Institute Inc     (Current Assignee) Battelle Memorial Institute Inc

David Burgoon, Mark Davis, Kevin Dorow, Todd Hitt, Douglas Mooney, Steven Rust, Loraine Sinnott
US8190610B2
CLAIM 1
. A method of processing data of a data set (said system) over a distributed system , wherein the data set comprises a plurality of data groups , the method comprising : partitioning the data of each one of the data groups into a plurality of data partitions that each have a plurality of key-value pairs and providing each data partition to a selected one of a plurality of mapping functions that are each user-configurable to independently output a plurality of lists of values for each of a set of keys found in such map function' ;

s corresponding data partition (sampling rate) to form corresponding intermediate data (sampling rate) (sampling rate) for that data group and identifiable to that data group , wherein the data of a first data (said sub) group has a different schema (original query) than the data of a second data group and the data of the first data group is mapped differently than the data of the second data group so that different lists of values are output for the corresponding different intermediate data , wherein the different schema and corresponding different intermediate data have a key in common ;

and reducing the intermediate data for the data groups to at least one output data group , including processing the intermediate data for each data group in a manner that is defined to correspond to that data group , so as to result in a merging of the corresponding different intermediate data based on the key in common , wherein the mapping and reducing operations are performed by a distributed system .
US20040111410A1
CLAIM 49
. A computer-implemented method as claimed in claim 43 wherein said select tuple is sampled at the time said select tuple' ;
s corresponding table is sampled at a sampling rate (s corresponding data partition to form corresponding intermediate data, corresponding intermediate data, s corresponding data partition) equal to the adjusted rate of inclusion .

US20040111410A1
CLAIM 53
. A computer-implemented method as claimed in claim 9 wherein said method further comprises : representing said sub (first data) set of said data source schema as a directed , acyclic graph having tables as vertices and table relationships as directed edges , said edges defining ancestor-descendant relationships between tuples in said data source ;
traversing said vertices of said acyclic graph ;
sampling each tuple associated with said vertices as each vertex is visited ;
copying each tuple selected through sampling into said representation ;
and optionally copying ancestor and descendant tuples associated with each tuple selected through sampling into said representation .

US20040111410A1
CLAIM 78
. A system for constructing a representation from a data source in order to provide response to queries related to information in said data source , wherein said data source has a plurality of tuples stored in said data source and a data source schema that includes defined relationships among at least a subset of the tuples in the data source , said system (data set, first data set, second data set) comprising : at least one processor ;
at least one storage device communicably coupled to said at least one processor arranged to store said data source and said representation ;
and software executable by said at least one processor for : creating said representation by copying at least a subset of said data source schema to define a representation schema ;
adding additional data to said representation that represents information that is not in said data source ;
defining tuples of interest within said data source and a degree of interest for each tuple of interest ;
sampling tuples from said tuples of interest into said representation based upon said degree of interest in a manner that preserves at least a subset of said relationships among tuples in the data source ;
and storing values in the representation that relate to the likelihood that each tuple sampled into said representation would be sampled into the representation if the sampling process were to be repeated .

US20040111410A1
CLAIM 84
. A system as claimed in claim 78 wherein said software implements an analyst component for : intercepting an original query (different schema) ;
remapping said original query into a format compatible with said representation ;
applying said remapped query against said representation ;
and providing the results of the remapped query in response to said original query .

US8190610B2
CLAIM 3
. The method of claim 1 , wherein : corresponding intermediate data (sampling rate) for a data group being identifiable to the data group includes the corresponding intermediate data being stored such that metadata associated with a manner of storage is identifiable to the data group .
US20040111410A1
CLAIM 49
. A computer-implemented method as claimed in claim 43 wherein said select tuple is sampled at the time said select tuple' ;
s corresponding table is sampled at a sampling rate (s corresponding data partition to form corresponding intermediate data, corresponding intermediate data, s corresponding data partition) equal to the adjusted rate of inclusion .

US8190610B2
CLAIM 4
. The method of claim 1 , wherein : corresponding intermediate data (sampling rate) for a data group being identifiable to the data group includes the corresponding intermediate data being stored in association with data that is identifiable to the data group .
US20040111410A1
CLAIM 49
. A computer-implemented method as claimed in claim 43 wherein said select tuple is sampled at the time said select tuple' ;
s corresponding table is sampled at a sampling rate (s corresponding data partition to form corresponding intermediate data, corresponding intermediate data, s corresponding data partition) equal to the adjusted rate of inclusion .

US8190610B2
CLAIM 8
. The method of claim 7 , wherein : in the partitioning step (above steps) , all of the key/value pairs of the intermediate data are provided to all of the partitions .
US20040111410A1
CLAIM 32
. A computer-implemented method as claimed in claim 9 further comprising determining an estimate of the size of said representation by : obtaining the number of child tuples for a single relationship ;
determining whether a target or an induced inclusion probability dominates ;
calculating an average actual inclusion probability of a parent table ;
and repeating the above steps (partitioning step, intermediate data processing step) recursively until an estimate of the expected size of said representation results .

US8190610B2
CLAIM 12
. The method of claim 5 , wherein : the intermediate data includes a plurality of grouped sets of key/value pairs ;

the reducing step is carried out by a plurality of reducers ;

for at least one of the reducers , the iterator corresponding to a particular data group (n points) , for that reducer , operates according to a different key (random number) of a different schema (original query) than the iterator corresponding to another particular data group , for that reducer .
US20040111410A1
CLAIM 41
. A computer-implemented method as claimed in claim 40 wherein said event is that a uniform random number (different key) on the open interval (0 , 1) is less than said adjusted rate of inclusion .

US20040111410A1
CLAIM 56
. A computer-implemented method as claimed in claim 53 wherein said act of traversing said vertices comprises : identifying a subset of the vertices as sampling initiation points (particular data group) ;
performing a breadth-first traversal of those vertices identified as sampling initiation points ;
traversing all vertices that can be reached from a sampling initiation point via pathways that follow the direction of said directed edges ;
and traversing all vertices that can be reached from a sampling initiation point via pathways that follow the opposite direction of said directed edges .

US20040111410A1
CLAIM 84
. A system as claimed in claim 78 wherein said software implements an analyst component for : intercepting an original query (different schema) ;
remapping said original query into a format compatible with said representation ;
applying said remapped query against said representation ;
and providing the results of the remapped query in response to said original query .

US8190610B2
CLAIM 13
. The method of claim 1 , wherein : the intermediate data processing step (above steps) of the reducing step further comprises processing data that is not intermediate data .
US20040111410A1
CLAIM 32
. A computer-implemented method as claimed in claim 9 further comprising determining an estimate of the size of said representation by : obtaining the number of child tuples for a single relationship ;
determining whether a target or an induced inclusion probability dominates ;
calculating an average actual inclusion probability of a parent table ;
and repeating the above steps (partitioning step, intermediate data processing step) recursively until an estimate of the expected size of said representation results .

US8190610B2
CLAIM 14
. The method of claim 13 , wherein : the reducing step is carried out by a plurality of reducers ;

and the data that is not intermediate data , for a particular reducer , includes data (includes data, real time) that is associated with another reducer .
US20040111410A1
CLAIM 14
. A computer-implemented method as claimed in claim 9 wherein : said representation is to be used to respond to queries against a parent table that are restricted to parents of a particular kind of child type ;
and said representation further includes data (includes data) added to said representation that is indicative of whether a select tuple in said parent table is associated with said particular kind of child type .

US20040111410A1
CLAIM 66
. A computer-implemented method as claimed in claim 64 wherein changes are identified in at least near real time (includes data) .

US8190610B2
CLAIM 15
. The method of claim 13 , wherein : the reducing step is carried out by a plurality of reducers ;

and the data that is not intermediate data , for a particular reducer , includes data (includes data, real time) that is associated with that reducer .
US20040111410A1
CLAIM 14
. A computer-implemented method as claimed in claim 9 wherein : said representation is to be used to respond to queries against a parent table that are restricted to parents of a particular kind of child type ;
and said representation further includes data (includes data) added to said representation that is indicative of whether a select tuple in said parent table is associated with said particular kind of child type .

US20040111410A1
CLAIM 66
. A computer-implemented method as claimed in claim 64 wherein changes are identified in at least near real time (includes data) .

US8190610B2
CLAIM 17
. A computer system including a plurality of computing devices , the computer system configured to process data of a data set (said system) , wherein the data set comprises a plurality of data groups , the computer system comprises at least one processor and memory that are operable to perform the following operations : partitioning the data of each one of the data groups into a plurality of data partitions that each have a plurality of key-value pairs and providing each data partition to a selected one of a plurality of mapping functions that are each user-configurable to independently output a plurality of lists of values for each of a set of keys found in such map function' ;

s corresponding data partition (sampling rate) to form corresponding intermediate data (sampling rate) (sampling rate) for that data group and identifiable to that data group , wherein the data of a first data (said sub) group has a different schema (original query) than the data of a second data group and the data of the first data group is mapped differently than the data of the second data group so that different lists of values are output for the corresponding different intermediate data , wherein the different schema and corresponding different intermediate data have a key in common ;

and reduce the intermediate data for the data groups to at least one output data group , including processing the intermediate data for each data group in a manner that is defined to correspond to that data group so as to result in a merging of the corresponding different intermediate data based on the key in common .
US20040111410A1
CLAIM 49
. A computer-implemented method as claimed in claim 43 wherein said select tuple is sampled at the time said select tuple' ;
s corresponding table is sampled at a sampling rate (s corresponding data partition to form corresponding intermediate data, corresponding intermediate data, s corresponding data partition) equal to the adjusted rate of inclusion .

US20040111410A1
CLAIM 53
. A computer-implemented method as claimed in claim 9 wherein said method further comprises : representing said sub (first data) set of said data source schema as a directed , acyclic graph having tables as vertices and table relationships as directed edges , said edges defining ancestor-descendant relationships between tuples in said data source ;
traversing said vertices of said acyclic graph ;
sampling each tuple associated with said vertices as each vertex is visited ;
copying each tuple selected through sampling into said representation ;
and optionally copying ancestor and descendant tuples associated with each tuple selected through sampling into said representation .

US20040111410A1
CLAIM 78
. A system for constructing a representation from a data source in order to provide response to queries related to information in said data source , wherein said data source has a plurality of tuples stored in said data source and a data source schema that includes defined relationships among at least a subset of the tuples in the data source , said system (data set, first data set, second data set) comprising : at least one processor ;
at least one storage device communicably coupled to said at least one processor arranged to store said data source and said representation ;
and software executable by said at least one processor for : creating said representation by copying at least a subset of said data source schema to define a representation schema ;
adding additional data to said representation that represents information that is not in said data source ;
defining tuples of interest within said data source and a degree of interest for each tuple of interest ;
sampling tuples from said tuples of interest into said representation based upon said degree of interest in a manner that preserves at least a subset of said relationships among tuples in the data source ;
and storing values in the representation that relate to the likelihood that each tuple sampled into said representation would be sampled into the representation if the sampling process were to be repeated .

US20040111410A1
CLAIM 84
. A system as claimed in claim 78 wherein said software implements an analyst component for : intercepting an original query (different schema) ;
remapping said original query into a format compatible with said representation ;
applying said remapped query against said representation ;
and providing the results of the remapped query in response to said original query .

US8190610B2
CLAIM 19
. The computer system of claim 17 , wherein : corresponding intermediate data (sampling rate) for a data group being identifiable to the data group includes the corresponding intermediate data being stored such that metadata associated with the manner of storage is identifiable to the data group .
US20040111410A1
CLAIM 49
. A computer-implemented method as claimed in claim 43 wherein said select tuple is sampled at the time said select tuple' ;
s corresponding table is sampled at a sampling rate (s corresponding data partition to form corresponding intermediate data, corresponding intermediate data, s corresponding data partition) equal to the adjusted rate of inclusion .

US8190610B2
CLAIM 20
. The computer system of claim 19 , wherein : corresponding intermediate data (sampling rate) for a data group being identifiable to the data group includes the computer system being configured to store the corresponding intermediate data in association with data that is identifiable to the data group .
US20040111410A1
CLAIM 49
. A computer-implemented method as claimed in claim 43 wherein said select tuple is sampled at the time said select tuple' ;
s corresponding table is sampled at a sampling rate (s corresponding data partition to form corresponding intermediate data, corresponding intermediate data, s corresponding data partition) equal to the adjusted rate of inclusion .

US8190610B2
CLAIM 22
. The computer system of claim 21 , wherein : the intermediate data includes a plurality of grouped sets of key/value pairs ;

the at least one processor and memory are further operable to reduce the intermediate data via a plurality of reducers ;

for at least one of the reducers , the iterator corresponding to a particular data group (n points) , for that reducer , is configured to operate according to a different key (random number) of a different schema (original query) than the iterator corresponding to another particular data group , for that reducer .
US20040111410A1
CLAIM 41
. A computer-implemented method as claimed in claim 40 wherein said event is that a uniform random number (different key) on the open interval (0 , 1) is less than said adjusted rate of inclusion .

US20040111410A1
CLAIM 56
. A computer-implemented method as claimed in claim 53 wherein said act of traversing said vertices comprises : identifying a subset of the vertices as sampling initiation points (particular data group) ;
performing a breadth-first traversal of those vertices identified as sampling initiation points ;
traversing all vertices that can be reached from a sampling initiation point via pathways that follow the direction of said directed edges ;
and traversing all vertices that can be reached from a sampling initiation point via pathways that follow the opposite direction of said directed edges .

US20040111410A1
CLAIM 84
. A system as claimed in claim 78 wherein said software implements an analyst component for : intercepting an original query (different schema) ;
remapping said original query into a format compatible with said representation ;
applying said remapped query against said representation ;
and providing the results of the remapped query in response to said original query .

US8190610B2
CLAIM 30
. The computer system of claim 29 , wherein : the at least one processor and memory are further operable to reduce the intermediate data via a plurality of reducers ;

and the data that is not intermediate data , for a particular reducer , includes data (includes data, real time) that is associated with another reducer .
US20040111410A1
CLAIM 14
. A computer-implemented method as claimed in claim 9 wherein : said representation is to be used to respond to queries against a parent table that are restricted to parents of a particular kind of child type ;
and said representation further includes data (includes data) added to said representation that is indicative of whether a select tuple in said parent table is associated with said particular kind of child type .

US20040111410A1
CLAIM 66
. A computer-implemented method as claimed in claim 64 wherein changes are identified in at least near real time (includes data) .

US8190610B2
CLAIM 31
. The computer system of claim 29 , wherein : the at least one processor and memory are further operable to reduce the intermediate data via a plurality of reducers ;

and the data that is not intermediate data , for a particular reducer , includes data (includes data, real time) that is associated with that reducer .
US20040111410A1
CLAIM 14
. A computer-implemented method as claimed in claim 9 wherein : said representation is to be used to respond to queries against a parent table that are restricted to parents of a particular kind of child type ;
and said representation further includes data (includes data) added to said representation that is indicative of whether a select tuple in said parent table is associated with said particular kind of child type .

US20040111410A1
CLAIM 66
. A computer-implemented method as claimed in claim 64 wherein changes are identified in at least near real time (includes data) .

US8190610B2
CLAIM 33
. A map-reduce method of processing data from a plurality of groups having different schema (original query) over a computer system , the method comprising : for a first data set (said system) having a plurality of first key-value pairs , wherein such first data set belongs to a first data (said sub) group and the first key-value pairs have a first schema , partitioning the first data set into a plurality of data partitions and providing each data partition to a selected one of a plurality of mapping functions that are each user-configurable to independently output a plurality of lists of values for each of a set of keys found in such map function' ;

s corresponding data partition (sampling rate) to form a first intermediate data set having a first set (first set) of resulting key-value pairs ;

for a second data set having a plurality of second key-value pairs , wherein such second data set belongs to a second data group and the second key-value pairs have a second schema , partitioning the first data set into a plurality of data partitions and providing each data partition to a selected one of a plurality of mapping functions to form a second intermediate data set having a second set (second set) of resulting key-value pairs differing from the first set of resulting key-value pairs , wherein the first schema differs from the second schema , wherein the first and second schema and the first and second set of resulting key-value pairs have a key in common ;

and reducing the first and the second intermediate data set together so as to form an output data set , wherein the reducing is accomplished by iterating on at least one key from the first and second intermediate data set , and the output data set has a different schema than the first and second schema , wherein the mapping and reducing operations are performed by a distributed system .
US20040111410A1
CLAIM 1
. A computer-implemented information reservoir creation process wherein : a table collection is constructed from a data source ;
said table collection includes a subset of tables designated as sampling initiation tables ;
each table in said table collection is a member of either a directly-sampled table set or a descendent-sampled table set ;
said directly-sampled table set is characterized by tables that are either sampling initiation tables or ancestor tables to one or more sampling initiation tables ;
said descendant-sampled table set is characterized by tables that are descendant tables to a sampling initiation table ;
said table collection is characterized by a table collection schema equivalent to a data source schema of said data source , with the exception that a list of attributes for each table of said directly-sampled table set includes an additional attribute containing actual rate of inclusion value (output data set) s ;
each tuple included in said table collection is equivalent to one and only one tuple in the corresponding table of said data source ;
an actual rate of inclusion value stored with a select data source tuple and included in a directly-sampled table of said table collection represents the probability that a randomly selected table collection produced by the process will contain said select data source tuple .

US20040111410A1
CLAIM 22
. A computer-implemented method as claimed in claim 16 wherein knowledge of an anticipated workload is encoded into a first set (first set) of queries that are representative of said knowledge of said anticipated workload to derive weighting factors used to establish said target rates of inclusion .

US20040111410A1
CLAIM 49
. A computer-implemented method as claimed in claim 43 wherein said select tuple is sampled at the time said select tuple' ;
s corresponding table is sampled at a sampling rate (s corresponding data partition to form corresponding intermediate data, corresponding intermediate data, s corresponding data partition) equal to the adjusted rate of inclusion .

US20040111410A1
CLAIM 53
. A computer-implemented method as claimed in claim 9 wherein said method further comprises : representing said sub (first data) set of said data source schema as a directed , acyclic graph having tables as vertices and table relationships as directed edges , said edges defining ancestor-descendant relationships between tuples in said data source ;
traversing said vertices of said acyclic graph ;
sampling each tuple associated with said vertices as each vertex is visited ;
copying each tuple selected through sampling into said representation ;
and optionally copying ancestor and descendant tuples associated with each tuple selected through sampling into said representation .

US20040111410A1
CLAIM 70
. A computer-implemented method as claimed in claim 9 further comprising maintaining the relative size of said representation by : identifying bounds for said representation ;
identifying a change to said data source ;
updating said representation based upon said change to said data source ;
performing a first set of operations if said representation is below said bounds comprising drawing a supplementary sample from said data source and joining said supplementary sample to said representation if deletions to said data source occur more frequently than additions to said data source ;
performing a second set (second set) of operations if said representation is within said bounds comprising allowing maintenance to said representation based upon said update ;
and performing a third set of operations if said representation is above said bounds comprising assigning a deletion inclusion probability to each tuple in said representation and subsampling said representation based upon said deletion inclusion probabilities .

US20040111410A1
CLAIM 78
. A system for constructing a representation from a data source in order to provide response to queries related to information in said data source , wherein said data source has a plurality of tuples stored in said data source and a data source schema that includes defined relationships among at least a subset of the tuples in the data source , said system (data set, first data set, second data set) comprising : at least one processor ;
at least one storage device communicably coupled to said at least one processor arranged to store said data source and said representation ;
and software executable by said at least one processor for : creating said representation by copying at least a subset of said data source schema to define a representation schema ;
adding additional data to said representation that represents information that is not in said data source ;
defining tuples of interest within said data source and a degree of interest for each tuple of interest ;
sampling tuples from said tuples of interest into said representation based upon said degree of interest in a manner that preserves at least a subset of said relationships among tuples in the data source ;
and storing values in the representation that relate to the likelihood that each tuple sampled into said representation would be sampled into the representation if the sampling process were to be repeated .

US20040111410A1
CLAIM 84
. A system as claimed in claim 78 wherein said software implements an analyst component for : intercepting an original query (different schema) ;
remapping said original query into a format compatible with said representation ;
applying said remapped query against said representation ;
and providing the results of the remapped query in response to said original query .

US8190610B2
CLAIM 34
. The map-reduce method of claim 33 , wherein the reducing is accomplished by iterating on at least one key from each of the first and second intermediate data set (said system) so that the output data set is a merging of a portion of the first and second intermediate data set .
US20040111410A1
CLAIM 1
. A computer-implemented information reservoir creation process wherein : a table collection is constructed from a data source ;
said table collection includes a subset of tables designated as sampling initiation tables ;
each table in said table collection is a member of either a directly-sampled table set or a descendent-sampled table set ;
said directly-sampled table set is characterized by tables that are either sampling initiation tables or ancestor tables to one or more sampling initiation tables ;
said descendant-sampled table set is characterized by tables that are descendant tables to a sampling initiation table ;
said table collection is characterized by a table collection schema equivalent to a data source schema of said data source , with the exception that a list of attributes for each table of said directly-sampled table set includes an additional attribute containing actual rate of inclusion value (output data set) s ;
each tuple included in said table collection is equivalent to one and only one tuple in the corresponding table of said data source ;
an actual rate of inclusion value stored with a select data source tuple and included in a directly-sampled table of said table collection represents the probability that a randomly selected table collection produced by the process will contain said select data source tuple .

US20040111410A1
CLAIM 78
. A system for constructing a representation from a data source in order to provide response to queries related to information in said data source , wherein said data source has a plurality of tuples stored in said data source and a data source schema that includes defined relationships among at least a subset of the tuples in the data source , said system (data set, first data set, second data set) comprising : at least one processor ;
at least one storage device communicably coupled to said at least one processor arranged to store said data source and said representation ;
and software executable by said at least one processor for : creating said representation by copying at least a subset of said data source schema to define a representation schema ;
adding additional data to said representation that represents information that is not in said data source ;
defining tuples of interest within said data source and a degree of interest for each tuple of interest ;
sampling tuples from said tuples of interest into said representation based upon said degree of interest in a manner that preserves at least a subset of said relationships among tuples in the data source ;
and storing values in the representation that relate to the likelihood that each tuple sampled into said representation would be sampled into the representation if the sampling process were to be repeated .

US8190610B2
CLAIM 35
. The map-reduce method of claim 33 , wherein the reducing is accomplished by a plurality of reducers , the method further comprises : partitioning the first or second intermediate data set (said system) into a plurality of partitions , at least some of the key-value pairs of the first or second intermediate data being provided to more than one of the partitions ;

and providing the partitioned intermediate data set of each partition to a separate one of the reducers .
US20040111410A1
CLAIM 78
. A system for constructing a representation from a data source in order to provide response to queries related to information in said data source , wherein said data source has a plurality of tuples stored in said data source and a data source schema that includes defined relationships among at least a subset of the tuples in the data source , said system (data set, first data set, second data set) comprising : at least one processor ;
at least one storage device communicably coupled to said at least one processor arranged to store said data source and said representation ;
and software executable by said at least one processor for : creating said representation by copying at least a subset of said data source schema to define a representation schema ;
adding additional data to said representation that represents information that is not in said data source ;
defining tuples of interest within said data source and a degree of interest for each tuple of interest ;
sampling tuples from said tuples of interest into said representation based upon said degree of interest in a manner that preserves at least a subset of said relationships among tuples in the data source ;
and storing values in the representation that relate to the likelihood that each tuple sampled into said representation would be sampled into the representation if the sampling process were to be repeated .

US8190610B2
CLAIM 36
. The map-reduce method of claim 35 , wherein all of the key-value pairs of the first and second intermediate data set (said system) are provided to all of the reducers .
US20040111410A1
CLAIM 78
. A system for constructing a representation from a data source in order to provide response to queries related to information in said data source , wherein said data source has a plurality of tuples stored in said data source and a data source schema that includes defined relationships among at least a subset of the tuples in the data source , said system (data set, first data set, second data set) comprising : at least one processor ;
at least one storage device communicably coupled to said at least one processor arranged to store said data source and said representation ;
and software executable by said at least one processor for : creating said representation by copying at least a subset of said data source schema to define a representation schema ;
adding additional data to said representation that represents information that is not in said data source ;
defining tuples of interest within said data source and a degree of interest for each tuple of interest ;
sampling tuples from said tuples of interest into said representation based upon said degree of interest in a manner that preserves at least a subset of said relationships among tuples in the data source ;
and storing values in the representation that relate to the likelihood that each tuple sampled into said representation would be sampled into the representation if the sampling process were to be repeated .

US8190610B2
CLAIM 40
. A computer system including a plurality of computing devices , the computer system configured to process data of a data set (said system) , wherein the data set comprises a plurality of data groups , the computer system comprises at least one processor and memory that are operable to perform the following operations : for a first data set having a plurality of first key-value pairs , wherein such first data set belongs to a first data (said sub) group and the first key-value pairs have a first schema , partitioning the first data set into a plurality of data partitions and providing each data partition to a selected one of a plurality of mapping functions that are each user-configurable to independently output a plurality of lists of values for each of a set of keys found in such map function' ;

s corresponding data partition (sampling rate) to form a first intermediate data set having a first set (first set) of resulting key-value pairs ;

for a second data set having a plurality of second key-value pairs , wherein such second data set belongs to a second data group and the second key-value pairs have a second schema , partitioning the first data set into a plurality of data partitions and providing each data partition to a selected one of a plurality of mapping functions to form a second intermediate data set having a second set (second set) of resulting key-value pairs differing from the first set of resulting key-value pairs , wherein the first schema differs from the second schema ;

and reducing the first and the second intermediate data set together so as to form an output data set , wherein the reducing is accomplished by iterating on at least one key from the first and second intermediate data set , and the output data set has a different schema (original query) than the first and second schema , wherein the first and second schema and the first and second set of resulting key-value pairs have a key in common .
US20040111410A1
CLAIM 1
. A computer-implemented information reservoir creation process wherein : a table collection is constructed from a data source ;
said table collection includes a subset of tables designated as sampling initiation tables ;
each table in said table collection is a member of either a directly-sampled table set or a descendent-sampled table set ;
said directly-sampled table set is characterized by tables that are either sampling initiation tables or ancestor tables to one or more sampling initiation tables ;
said descendant-sampled table set is characterized by tables that are descendant tables to a sampling initiation table ;
said table collection is characterized by a table collection schema equivalent to a data source schema of said data source , with the exception that a list of attributes for each table of said directly-sampled table set includes an additional attribute containing actual rate of inclusion value (output data set) s ;
each tuple included in said table collection is equivalent to one and only one tuple in the corresponding table of said data source ;
an actual rate of inclusion value stored with a select data source tuple and included in a directly-sampled table of said table collection represents the probability that a randomly selected table collection produced by the process will contain said select data source tuple .

US20040111410A1
CLAIM 22
. A computer-implemented method as claimed in claim 16 wherein knowledge of an anticipated workload is encoded into a first set (first set) of queries that are representative of said knowledge of said anticipated workload to derive weighting factors used to establish said target rates of inclusion .

US20040111410A1
CLAIM 49
. A computer-implemented method as claimed in claim 43 wherein said select tuple is sampled at the time said select tuple' ;
s corresponding table is sampled at a sampling rate (s corresponding data partition to form corresponding intermediate data, corresponding intermediate data, s corresponding data partition) equal to the adjusted rate of inclusion .

US20040111410A1
CLAIM 53
. A computer-implemented method as claimed in claim 9 wherein said method further comprises : representing said sub (first data) set of said data source schema as a directed , acyclic graph having tables as vertices and table relationships as directed edges , said edges defining ancestor-descendant relationships between tuples in said data source ;
traversing said vertices of said acyclic graph ;
sampling each tuple associated with said vertices as each vertex is visited ;
copying each tuple selected through sampling into said representation ;
and optionally copying ancestor and descendant tuples associated with each tuple selected through sampling into said representation .

US20040111410A1
CLAIM 70
. A computer-implemented method as claimed in claim 9 further comprising maintaining the relative size of said representation by : identifying bounds for said representation ;
identifying a change to said data source ;
updating said representation based upon said change to said data source ;
performing a first set of operations if said representation is below said bounds comprising drawing a supplementary sample from said data source and joining said supplementary sample to said representation if deletions to said data source occur more frequently than additions to said data source ;
performing a second set (second set) of operations if said representation is within said bounds comprising allowing maintenance to said representation based upon said update ;
and performing a third set of operations if said representation is above said bounds comprising assigning a deletion inclusion probability to each tuple in said representation and subsampling said representation based upon said deletion inclusion probabilities .

US20040111410A1
CLAIM 78
. A system for constructing a representation from a data source in order to provide response to queries related to information in said data source , wherein said data source has a plurality of tuples stored in said data source and a data source schema that includes defined relationships among at least a subset of the tuples in the data source , said system (data set, first data set, second data set) comprising : at least one processor ;
at least one storage device communicably coupled to said at least one processor arranged to store said data source and said representation ;
and software executable by said at least one processor for : creating said representation by copying at least a subset of said data source schema to define a representation schema ;
adding additional data to said representation that represents information that is not in said data source ;
defining tuples of interest within said data source and a degree of interest for each tuple of interest ;
sampling tuples from said tuples of interest into said representation based upon said degree of interest in a manner that preserves at least a subset of said relationships among tuples in the data source ;
and storing values in the representation that relate to the likelihood that each tuple sampled into said representation would be sampled into the representation if the sampling process were to be repeated .

US20040111410A1
CLAIM 84
. A system as claimed in claim 78 wherein said software implements an analyst component for : intercepting an original query (different schema) ;
remapping said original query into a format compatible with said representation ;
applying said remapped query against said representation ;
and providing the results of the remapped query in response to said original query .

US8190610B2
CLAIM 41
. The computer system of claim 40 , wherein the reducing is accomplished by iterating on at least one key from each of the first and second intermediate data set (said system) so that the output data set is a merging of a portion of the first and second intermediate data set .
US20040111410A1
CLAIM 1
. A computer-implemented information reservoir creation process wherein : a table collection is constructed from a data source ;
said table collection includes a subset of tables designated as sampling initiation tables ;
each table in said table collection is a member of either a directly-sampled table set or a descendent-sampled table set ;
said directly-sampled table set is characterized by tables that are either sampling initiation tables or ancestor tables to one or more sampling initiation tables ;
said descendant-sampled table set is characterized by tables that are descendant tables to a sampling initiation table ;
said table collection is characterized by a table collection schema equivalent to a data source schema of said data source , with the exception that a list of attributes for each table of said directly-sampled table set includes an additional attribute containing actual rate of inclusion value (output data set) s ;
each tuple included in said table collection is equivalent to one and only one tuple in the corresponding table of said data source ;
an actual rate of inclusion value stored with a select data source tuple and included in a directly-sampled table of said table collection represents the probability that a randomly selected table collection produced by the process will contain said select data source tuple .

US20040111410A1
CLAIM 78
. A system for constructing a representation from a data source in order to provide response to queries related to information in said data source , wherein said data source has a plurality of tuples stored in said data source and a data source schema that includes defined relationships among at least a subset of the tuples in the data source , said system (data set, first data set, second data set) comprising : at least one processor ;
at least one storage device communicably coupled to said at least one processor arranged to store said data source and said representation ;
and software executable by said at least one processor for : creating said representation by copying at least a subset of said data source schema to define a representation schema ;
adding additional data to said representation that represents information that is not in said data source ;
defining tuples of interest within said data source and a degree of interest for each tuple of interest ;
sampling tuples from said tuples of interest into said representation based upon said degree of interest in a manner that preserves at least a subset of said relationships among tuples in the data source ;
and storing values in the representation that relate to the likelihood that each tuple sampled into said representation would be sampled into the representation if the sampling process were to be repeated .

US8190610B2
CLAIM 42
. The computer system of claim 40 , wherein the reducing is accomplished by a plurality of reducers , the d the at least one processor and memory further operable for : partitioning the first or second intermediate data set (said system) into a plurality of partitions , at least some of the key-value pairs of the first or second intermediate data being provided to more than one of the partitions ;

and providing the partitioned intermediate data set of each partition to a separate one of the reducers .
US20040111410A1
CLAIM 78
. A system for constructing a representation from a data source in order to provide response to queries related to information in said data source , wherein said data source has a plurality of tuples stored in said data source and a data source schema that includes defined relationships among at least a subset of the tuples in the data source , said system (data set, first data set, second data set) comprising : at least one processor ;
at least one storage device communicably coupled to said at least one processor arranged to store said data source and said representation ;
and software executable by said at least one processor for : creating said representation by copying at least a subset of said data source schema to define a representation schema ;
adding additional data to said representation that represents information that is not in said data source ;
defining tuples of interest within said data source and a degree of interest for each tuple of interest ;
sampling tuples from said tuples of interest into said representation based upon said degree of interest in a manner that preserves at least a subset of said relationships among tuples in the data source ;
and storing values in the representation that relate to the likelihood that each tuple sampled into said representation would be sampled into the representation if the sampling process were to be repeated .

US8190610B2
CLAIM 43
. The computer system of claim 42 , wherein all of the key-value pairs of the first and second intermediate data set (said system) are provided to all of the reducers .
US20040111410A1
CLAIM 78
. A system for constructing a representation from a data source in order to provide response to queries related to information in said data source , wherein said data source has a plurality of tuples stored in said data source and a data source schema that includes defined relationships among at least a subset of the tuples in the data source , said system (data set, first data set, second data set) comprising : at least one processor ;
at least one storage device communicably coupled to said at least one processor arranged to store said data source and said representation ;
and software executable by said at least one processor for : creating said representation by copying at least a subset of said data source schema to define a representation schema ;
adding additional data to said representation that represents information that is not in said data source ;
defining tuples of interest within said data source and a degree of interest for each tuple of interest ;
sampling tuples from said tuples of interest into said representation based upon said degree of interest in a manner that preserves at least a subset of said relationships among tuples in the data source ;
and storing values in the representation that relate to the likelihood that each tuple sampled into said representation would be sampled into the representation if the sampling process were to be repeated .




US8190610B2

Filed: 2006-10-05     Issued: 2012-05-29

MapReduce for distributed database processing

(Original Assignee) Yahoo Inc     (Current Assignee) R2 Solutions LLC ; Altaba Inc

Ali Dasdan, Hung-Chih Yang, Ruey-Lung Hsiao
WO2005013139A1

Filed: 2003-09-23     Issued: 2005-02-10

A contents synchronization system in network environment and a method therefor

(Original Assignee) Nitgen Technologies Inc.     

Nam-Yul Lee, Kee-Joo Yoon
US8190610B2
CLAIM 1
. A method of processing data (other region) of a data set (said system) over a distributed system , wherein the data set comprises a plurality of data groups , the method comprising : partitioning the data of each one of the data groups into a plurality of data partitions that each have a plurality of key-value pairs and providing each data partition to a selected one of a plurality of mapping functions (management function, encryption function) that are each user-configurable to independently output a plurality of lists of values for each of a set of keys found in such map function' ;

s corresponding data partition to form corresponding intermediate data for that data group and identifiable to that data group , wherein the data of a first data (time t) group has a different schema than the data of a second data group and the data of the first data group is mapped differently than the data of the second data group so that different lists of values are output for the corresponding different intermediate data (User Interface) , wherein the different schema and corresponding different intermediate data have a key in common ;

and reducing the intermediate data for the data groups to at least one output data group , including processing the intermediate data for each data group in a manner that is defined to correspond to that data group , so as to result in a merging of the corresponding different intermediate data based on the key in common , wherein the mapping and reducing operations are performed by a distributed system .
WO2005013139A1
CLAIM 1
. In network environment including LAN and WAN a Content Distribution Master (CD Master) that is a contents synchronization system transmitting the modified contents of source data servers to target servers , said CD Master comprising a Content Distribution Master server (CD Master server) , a Content Monitoring System server (CMS server) , a Content Agent System server (CAS server) , a Server Monitoring Agent server (SM Agent server) , a Content Distribution Master Admin Tool (CD Master Admin Tool) and an authentic server , wherein said CD Master server manages data distribution and data transmission and controls the service circumstances of said CMS server , CAS server , SM Agent server and monitors the data transmission status and the status of said CMS server , CAS server , SM Agent server , CD Master Admin Tool , authentic server ;
said CMS server monitors in real time at the operating system level whether the data of folders designated by a network manager are created , modified or deleted , and notifies the modified contents to said CD Master server ;
said CAS server transmits data to other CAS servers or receives data from other CAS servers according to the instruction of said CD Master server ;
said SM Agent server collects server status information about CPU , Memory , Session number of the installed CD Master server , CMS server , CAS server , CD Master Admin Tool , authentic server every constant time interval periodically and notifies the collected information to said CD Master server ;
said CD Master Admin Tool of GUI (Graphic User Interface (different intermediate data) ) environment being independent from operating system platform based on the development in Java environment and is a management tool to support that said CD Master system manager sets CD Master service environment easily and provides intuitive interface and sets and confirms the service environments including service server management , environment setting between said CMS server and CAS server , manager' ;
s account management , server status monitoring , scheduling , synchronization , server monitor agent setting , job log confirmation , operating environment setting through CD Master server ;
and said authentic server is a license system of said contents synchronization system and issues and manages CD Master License Keys and classifies servers as tree-structured three levels of Region , Group , Server for effective contents synchronization among servers grouped based on network topology being served actually , wherein Region is the highest level , Group is a medium level , and Server is a lowest level , and manages Region , Group , CAS server , CMS server and the restriction of the usable days for operating said synchronization system .

WO2005013139A1
CLAIM 2
. A contents synchronization system as set forth in claim 1 , wherein it is characterized in that said CD Master performing : a data filtering function that includes or excludes data and uses regular expression method , wherein the include helps said CD Master to make a manager specify the kind of data to transmit to target servers and only transmit a specific kind of data to target servers , and the exclude helps CD Master to make a manager exclude a specific kind of data files from the transmission ;
a multi contents generating function ;
a multiple data transmission method function including real time t (first data, first data group) ransmission , manual transmission , reserved transmission ;
a multi data transmission path setting function ;
and a data transmission fail-over function by network failure .

WO2005013139A1
CLAIM 4
. A contents synchronization system as set forth in claim 2 , wherein it is characterized in that said multi contents generating function makes it possible for said CD Master to designate the synchronization timing of data diversely and to perform synchronization and backup of data by transmitting data in multiple source servers to all target servers and for every changed data of a specific data center to be transmitted to all target servers of another network center or another region (processing data, intermediate data processing) , wherein in case N and M are arbitrary natural numbers , the transmission service is called as N : M type data transmission service .

WO2005013139A1
CLAIM 12
. A contents synchronization system as set forth in claim 1 , wherein it is characterized in that a management function (mapping functions) of said service servers makes it possible through said CD Master Admin Tool that a network manager adds a server newly to be a service object of said CD Master and modifies and deletes the environments of existing registered servers ;
an environment setting function of said CMS server and CAS servers makes it possible for said CMS server to set files and folders to be monitored and to set path to store data received from CAS servers installed in other servers ;
a management function of said manager account creates , modifies , deletes the account and information of the manager with that the access to said CD Master Admin Tool is possible ;
said server monitoring function shows the current status of registered service servers in forms of graph and table ;
said work log confirmation function makes it possible to confirm all job log about all synchronization jobs , manual jobs , reserved jobs that are performed under control of said CD Master server ;
and said management function is an application of GUI environment that can operate independently from platforms of operating system , and makes it possible to manage network easily and simply .

WO2005013139A1
CLAIM 15
. A contents synchronization system as set forth in claim 1 , wherein it is characterized in that said system (data set, first data set, second data set) keeps contents of all server groups identical with each other in order to provide services according to the objects for multiple server groups clustered through load balancer installed with switching facilities , wherein said server groups have the same objects .

WO2005013139A1
CLAIM 20
. A contents synchronization method as set forth in any one of claims 16 to 18 , wherein it is characterized in that in case of data transmission among the CAS servers compressed data transmission function is performed and in case of contents synchronization the compressed data transmission function compresses and encodes data and reduces network' ;
s load , said file transmission is specified in consideration of characteristics of network structure including International Region that does not belong to Same Region , Same Group , in case of the file transmission the whole files are dump copied but in case of frequently updated files only the changed parts of files are transmitted after comparison of files , wherein the transmission method is called as different patch , contents synchronization is performed by defining whether encryption is used or not by combination of AND conditions , or at the same time by specifying whether SSL encryption is used and whether dump copy or different patch is used , the manager sets predetermined multi-level compression rates in consideration of network bandwidth of each transmission section , the CD Master server supports packet encryption of transmission data using SSL and previously intercepts information leakage through hacking by using encryption function (mapping functions) in order to protect important data and contents of enterprises and persons , and it is possible to define SSL encryption section selectively among the whole sections of source servers and target servers , wherein by reflecting network status of LAN and WAN sections to the maximum and setting , transmission rate increases and data is protected safely .

US8190610B2
CLAIM 9
. The method of claim 7 , wherein : at least some of the reducers include a sort , group-by-key and combine task (current status) ;

the method further comprises generating and providing metadata for at least some of the mapping , partitioning , combining , grouping and sorting .
WO2005013139A1
CLAIM 12
. A contents synchronization system as set forth in claim 1 , wherein it is characterized in that a management function of said service servers makes it possible through said CD Master Admin Tool that a network manager adds a server newly to be a service object of said CD Master and modifies and deletes the environments of existing registered servers ;
an environment setting function of said CMS server and CAS servers makes it possible for said CMS server to set files and folders to be monitored and to set path to store data received from CAS servers installed in other servers ;
a management function of said manager account creates , modifies , deletes the account and information of the manager with that the access to said CD Master Admin Tool is possible ;
said server monitoring function shows the current status (combine task) of registered service servers in forms of graph and table ;
said work log confirmation function makes it possible to confirm all job log about all synchronization jobs , manual jobs , reserved jobs that are performed under control of said CD Master server ;
and said management function is an application of GUI environment that can operate independently from platforms of operating system , and makes it possible to manage network easily and simply .

US8190610B2
CLAIM 12
. The method of claim 5 , wherein : the intermediate data includes a plurality of grouped sets of key/value pairs ;

the reducing step is carried out by a plurality of reducers ;

for at least one of the reducers , the iterator corresponding to a particular data (predetermined times) group , for that reducer , operates according to a different key of a different schema than the iterator corresponding to another particular data group , for that reducer .
WO2005013139A1
CLAIM 7
. A contents synchronization system as set forth in claim 2 , wherein it is characterized in that in case of network' ;
s failure , the data transmission fail-over function makes it possible for said CD Master to transmit data via bypass by preparing for a case of no-transmission of data in a specific section and by monitoring the no-transmission , wherein it is checked whether the provided bypass belongs to the same Group , to the same Region or to an international Region among Region , Group and Server of an existing structured network and data is transmitted again from a nearest CAS server and in case data cannot be transmitted from every CAS server because of a fatal problem of a server , and the transmission is retried a predetermined times (particular data) and if the result of transmission of the CAS server is fail , said CD Master server performs contents synchronization for corresponding target servers according to the recovery procedure of preset target servers in case of failure recovery of target servers in which the corresponding failure occurred and in the procedure it should be set selectively according to the circumstances whether contents synchronization should be performed at once after server' ;
s failure is recovered , or contents synchronization should be performed at a reserved time which a manager designated , or contents synchronization of target servers having failure transmission should be performed manually

US8190610B2
CLAIM 13
. The method of claim 1 , wherein : the intermediate data processing (other region) step of the reducing step further comprises processing data (other region) that is not intermediate data .
WO2005013139A1
CLAIM 4
. A contents synchronization system as set forth in claim 2 , wherein it is characterized in that said multi contents generating function makes it possible for said CD Master to designate the synchronization timing of data diversely and to perform synchronization and backup of data by transmitting data in multiple source servers to all target servers and for every changed data of a specific data center to be transmitted to all target servers of another network center or another region (processing data, intermediate data processing) , wherein in case N and M are arbitrary natural numbers , the transmission service is called as N : M type data transmission service .

US8190610B2
CLAIM 17
. A computer system including a plurality of computing devices , the computer system configured to process data of a data set (said system) , wherein the data set comprises a plurality of data groups , the computer system comprises at least one processor and memory that are operable to perform the following operations : partitioning the data of each one of the data groups into a plurality of data partitions that each have a plurality of key-value pairs and providing each data partition to a selected one of a plurality of mapping functions (management function, encryption function) that are each user-configurable to independently output a plurality of lists of values for each of a set of keys found in such map function' ;

s corresponding data partition to form corresponding intermediate data for that data group and identifiable to that data group , wherein the data of a first data (time t) group has a different schema than the data of a second data group and the data of the first data group is mapped differently than the data of the second data group so that different lists of values are output for the corresponding different intermediate data (User Interface) , wherein the different schema and corresponding different intermediate data have a key in common ;

and reduce the intermediate data for the data groups to at least one output data group , including processing the intermediate data for each data group in a manner that is defined to correspond to that data group so as to result in a merging of the corresponding different intermediate data based on the key in common .
WO2005013139A1
CLAIM 1
. In network environment including LAN and WAN a Content Distribution Master (CD Master) that is a contents synchronization system transmitting the modified contents of source data servers to target servers , said CD Master comprising a Content Distribution Master server (CD Master server) , a Content Monitoring System server (CMS server) , a Content Agent System server (CAS server) , a Server Monitoring Agent server (SM Agent server) , a Content Distribution Master Admin Tool (CD Master Admin Tool) and an authentic server , wherein said CD Master server manages data distribution and data transmission and controls the service circumstances of said CMS server , CAS server , SM Agent server and monitors the data transmission status and the status of said CMS server , CAS server , SM Agent server , CD Master Admin Tool , authentic server ;
said CMS server monitors in real time at the operating system level whether the data of folders designated by a network manager are created , modified or deleted , and notifies the modified contents to said CD Master server ;
said CAS server transmits data to other CAS servers or receives data from other CAS servers according to the instruction of said CD Master server ;
said SM Agent server collects server status information about CPU , Memory , Session number of the installed CD Master server , CMS server , CAS server , CD Master Admin Tool , authentic server every constant time interval periodically and notifies the collected information to said CD Master server ;
said CD Master Admin Tool of GUI (Graphic User Interface (different intermediate data) ) environment being independent from operating system platform based on the development in Java environment and is a management tool to support that said CD Master system manager sets CD Master service environment easily and provides intuitive interface and sets and confirms the service environments including service server management , environment setting between said CMS server and CAS server , manager' ;
s account management , server status monitoring , scheduling , synchronization , server monitor agent setting , job log confirmation , operating environment setting through CD Master server ;
and said authentic server is a license system of said contents synchronization system and issues and manages CD Master License Keys and classifies servers as tree-structured three levels of Region , Group , Server for effective contents synchronization among servers grouped based on network topology being served actually , wherein Region is the highest level , Group is a medium level , and Server is a lowest level , and manages Region , Group , CAS server , CMS server and the restriction of the usable days for operating said synchronization system .

WO2005013139A1
CLAIM 2
. A contents synchronization system as set forth in claim 1 , wherein it is characterized in that said CD Master performing : a data filtering function that includes or excludes data and uses regular expression method , wherein the include helps said CD Master to make a manager specify the kind of data to transmit to target servers and only transmit a specific kind of data to target servers , and the exclude helps CD Master to make a manager exclude a specific kind of data files from the transmission ;
a multi contents generating function ;
a multiple data transmission method function including real time t (first data, first data group) ransmission , manual transmission , reserved transmission ;
a multi data transmission path setting function ;
and a data transmission fail-over function by network failure .

WO2005013139A1
CLAIM 12
. A contents synchronization system as set forth in claim 1 , wherein it is characterized in that a management function (mapping functions) of said service servers makes it possible through said CD Master Admin Tool that a network manager adds a server newly to be a service object of said CD Master and modifies and deletes the environments of existing registered servers ;
an environment setting function of said CMS server and CAS servers makes it possible for said CMS server to set files and folders to be monitored and to set path to store data received from CAS servers installed in other servers ;
a management function of said manager account creates , modifies , deletes the account and information of the manager with that the access to said CD Master Admin Tool is possible ;
said server monitoring function shows the current status of registered service servers in forms of graph and table ;
said work log confirmation function makes it possible to confirm all job log about all synchronization jobs , manual jobs , reserved jobs that are performed under control of said CD Master server ;
and said management function is an application of GUI environment that can operate independently from platforms of operating system , and makes it possible to manage network easily and simply .

WO2005013139A1
CLAIM 15
. A contents synchronization system as set forth in claim 1 , wherein it is characterized in that said system (data set, first data set, second data set) keeps contents of all server groups identical with each other in order to provide services according to the objects for multiple server groups clustered through load balancer installed with switching facilities , wherein said server groups have the same objects .

WO2005013139A1
CLAIM 20
. A contents synchronization method as set forth in any one of claims 16 to 18 , wherein it is characterized in that in case of data transmission among the CAS servers compressed data transmission function is performed and in case of contents synchronization the compressed data transmission function compresses and encodes data and reduces network' ;
s load , said file transmission is specified in consideration of characteristics of network structure including International Region that does not belong to Same Region , Same Group , in case of the file transmission the whole files are dump copied but in case of frequently updated files only the changed parts of files are transmitted after comparison of files , wherein the transmission method is called as different patch , contents synchronization is performed by defining whether encryption is used or not by combination of AND conditions , or at the same time by specifying whether SSL encryption is used and whether dump copy or different patch is used , the manager sets predetermined multi-level compression rates in consideration of network bandwidth of each transmission section , the CD Master server supports packet encryption of transmission data using SSL and previously intercepts information leakage through hacking by using encryption function (mapping functions) in order to protect important data and contents of enterprises and persons , and it is possible to define SSL encryption section selectively among the whole sections of source servers and target servers , wherein by reflecting network status of LAN and WAN sections to the maximum and setting , transmission rate increases and data is protected safely .

US8190610B2
CLAIM 22
. The computer system of claim 21 , wherein : the intermediate data includes a plurality of grouped sets of key/value pairs ;

the at least one processor and memory are further operable to reduce the intermediate data via a plurality of reducers ;

for at least one of the reducers , the iterator corresponding to a particular data (predetermined times) group , for that reducer , is configured to operate according to a different key of a different schema than the iterator corresponding to another particular data group , for that reducer .
WO2005013139A1
CLAIM 7
. A contents synchronization system as set forth in claim 2 , wherein it is characterized in that in case of network' ;
s failure , the data transmission fail-over function makes it possible for said CD Master to transmit data via bypass by preparing for a case of no-transmission of data in a specific section and by monitoring the no-transmission , wherein it is checked whether the provided bypass belongs to the same Group , to the same Region or to an international Region among Region , Group and Server of an existing structured network and data is transmitted again from a nearest CAS server and in case data cannot be transmitted from every CAS server because of a fatal problem of a server , and the transmission is retried a predetermined times (particular data) and if the result of transmission of the CAS server is fail , said CD Master server performs contents synchronization for corresponding target servers according to the recovery procedure of preset target servers in case of failure recovery of target servers in which the corresponding failure occurred and in the procedure it should be set selectively according to the circumstances whether contents synchronization should be performed at once after server' ;
s failure is recovered , or contents synchronization should be performed at a reserved time which a manager designated , or contents synchronization of target servers having failure transmission should be performed manually

US8190610B2
CLAIM 26
. The computer system of claim 24 , wherein : at least some of the reducers include a sort , group-by-key and combine task (current status) ;

the at least one processor and memory are further operable to generate and provide metadata for at least some of the mapping , partitioning , combining , grouping and sorting .
WO2005013139A1
CLAIM 12
. A contents synchronization system as set forth in claim 1 , wherein it is characterized in that a management function of said service servers makes it possible through said CD Master Admin Tool that a network manager adds a server newly to be a service object of said CD Master and modifies and deletes the environments of existing registered servers ;
an environment setting function of said CMS server and CAS servers makes it possible for said CMS server to set files and folders to be monitored and to set path to store data received from CAS servers installed in other servers ;
a management function of said manager account creates , modifies , deletes the account and information of the manager with that the access to said CD Master Admin Tool is possible ;
said server monitoring function shows the current status (combine task) of registered service servers in forms of graph and table ;
said work log confirmation function makes it possible to confirm all job log about all synchronization jobs , manual jobs , reserved jobs that are performed under control of said CD Master server ;
and said management function is an application of GUI environment that can operate independently from platforms of operating system , and makes it possible to manage network easily and simply .

US8190610B2
CLAIM 29
. The computer system of claim 17 , wherein : the intermediate data processing (other region) of the reducing further comprises processing data (other region) that is not intermediate data .
WO2005013139A1
CLAIM 4
. A contents synchronization system as set forth in claim 2 , wherein it is characterized in that said multi contents generating function makes it possible for said CD Master to designate the synchronization timing of data diversely and to perform synchronization and backup of data by transmitting data in multiple source servers to all target servers and for every changed data of a specific data center to be transmitted to all target servers of another network center or another region (processing data, intermediate data processing) , wherein in case N and M are arbitrary natural numbers , the transmission service is called as N : M type data transmission service .

US8190610B2
CLAIM 33
. A map-reduce method of processing data (other region) from a plurality of groups having different schema over a computer system , the method comprising : for a first data set (said system) having a plurality of first key-value pairs , wherein such first data set belongs to a first data (time t) group and the first key-value pairs have a first schema , partitioning the first data set into a plurality of data partitions and providing each data partition to a selected one of a plurality of mapping functions (management function, encryption function) that are each user-configurable to independently output a plurality of lists of values for each of a set of keys found in such map function' ;

s corresponding data partition to form a first intermediate data set having a first set of resulting key-value pairs ;

for a second data set having a plurality of second key-value pairs , wherein such second data set belongs to a second data group and the second key-value pairs have a second schema , partitioning the first data set into a plurality of data partitions and providing each data partition to a selected one of a plurality of mapping functions to form a second intermediate data set having a second set (said server) of resulting key-value pairs differing from the first set of resulting key-value pairs , wherein the first schema differs from the second schema , wherein the first and second schema and the first and second set of resulting key-value pairs have a key in common ;

and reducing the first and the second intermediate data set together so as to form an output data set , wherein the reducing is accomplished by iterating on at least one key from the first and second intermediate data set , and the output data set has a different schema than the first and second schema , wherein the mapping and reducing operations are performed by a distributed system .
WO2005013139A1
CLAIM 2
. A contents synchronization system as set forth in claim 1 , wherein it is characterized in that said CD Master performing : a data filtering function that includes or excludes data and uses regular expression method , wherein the include helps said CD Master to make a manager specify the kind of data to transmit to target servers and only transmit a specific kind of data to target servers , and the exclude helps CD Master to make a manager exclude a specific kind of data files from the transmission ;
a multi contents generating function ;
a multiple data transmission method function including real time t (first data, first data group) ransmission , manual transmission , reserved transmission ;
a multi data transmission path setting function ;
and a data transmission fail-over function by network failure .

WO2005013139A1
CLAIM 4
. A contents synchronization system as set forth in claim 2 , wherein it is characterized in that said multi contents generating function makes it possible for said CD Master to designate the synchronization timing of data diversely and to perform synchronization and backup of data by transmitting data in multiple source servers to all target servers and for every changed data of a specific data center to be transmitted to all target servers of another network center or another region (processing data, intermediate data processing) , wherein in case N and M are arbitrary natural numbers , the transmission service is called as N : M type data transmission service .

WO2005013139A1
CLAIM 11
. A contents synchronization system as set forth in claim 10 , wherein it is characterized in that said server (second set) monitoring information is got by said SM Agent server and referenced in transmission for synchronization of said CD Master and if failure of a CAS server of contents synchronization path is monitored , said CD Master performs contents synchronization for other CAS servers except the corresponding CAS server and in case the corresponding CAS server , in which the failure has occurred is recovered later , then the synchronization is performed by a CAS server in neighboring other path and in case as a result of monitoring by said SM Agent server , server' ;
s physical problems or software problems including PING failure , failure of each PORT monitoring , Agent response failure , or load of CPU , Memory and Session are monitored , said CD Master server notifies the monitored results to a manager through alarm information , SMS , E-mail by using CD Master Admin Tool and makes the manager check the status of servers and respond rapidly for failures .

WO2005013139A1
CLAIM 12
. A contents synchronization system as set forth in claim 1 , wherein it is characterized in that a management function (mapping functions) of said service servers makes it possible through said CD Master Admin Tool that a network manager adds a server newly to be a service object of said CD Master and modifies and deletes the environments of existing registered servers ;
an environment setting function of said CMS server and CAS servers makes it possible for said CMS server to set files and folders to be monitored and to set path to store data received from CAS servers installed in other servers ;
a management function of said manager account creates , modifies , deletes the account and information of the manager with that the access to said CD Master Admin Tool is possible ;
said server monitoring function shows the current status of registered service servers in forms of graph and table ;
said work log confirmation function makes it possible to confirm all job log about all synchronization jobs , manual jobs , reserved jobs that are performed under control of said CD Master server ;
and said management function is an application of GUI environment that can operate independently from platforms of operating system , and makes it possible to manage network easily and simply .

WO2005013139A1
CLAIM 15
. A contents synchronization system as set forth in claim 1 , wherein it is characterized in that said system (data set, first data set, second data set) keeps contents of all server groups identical with each other in order to provide services according to the objects for multiple server groups clustered through load balancer installed with switching facilities , wherein said server groups have the same objects .

WO2005013139A1
CLAIM 20
. A contents synchronization method as set forth in any one of claims 16 to 18 , wherein it is characterized in that in case of data transmission among the CAS servers compressed data transmission function is performed and in case of contents synchronization the compressed data transmission function compresses and encodes data and reduces network' ;
s load , said file transmission is specified in consideration of characteristics of network structure including International Region that does not belong to Same Region , Same Group , in case of the file transmission the whole files are dump copied but in case of frequently updated files only the changed parts of files are transmitted after comparison of files , wherein the transmission method is called as different patch , contents synchronization is performed by defining whether encryption is used or not by combination of AND conditions , or at the same time by specifying whether SSL encryption is used and whether dump copy or different patch is used , the manager sets predetermined multi-level compression rates in consideration of network bandwidth of each transmission section , the CD Master server supports packet encryption of transmission data using SSL and previously intercepts information leakage through hacking by using encryption function (mapping functions) in order to protect important data and contents of enterprises and persons , and it is possible to define SSL encryption section selectively among the whole sections of source servers and target servers , wherein by reflecting network status of LAN and WAN sections to the maximum and setting , transmission rate increases and data is protected safely .

US8190610B2
CLAIM 34
. The map-reduce method of claim 33 , wherein the reducing is accomplished by iterating on at least one key from each of the first and second intermediate data set (said system) so that the output data set is a merging of a portion of the first and second intermediate data set .
WO2005013139A1
CLAIM 15
. A contents synchronization system as set forth in claim 1 , wherein it is characterized in that said system (data set, first data set, second data set) keeps contents of all server groups identical with each other in order to provide services according to the objects for multiple server groups clustered through load balancer installed with switching facilities , wherein said server groups have the same objects .

US8190610B2
CLAIM 35
. The map-reduce method of claim 33 , wherein the reducing is accomplished by a plurality of reducers , the method further comprises : partitioning the first or second intermediate data set (said system) into a plurality of partitions , at least some of the key-value pairs of the first or second intermediate data being provided to more than one of the partitions ;

and providing the partitioned intermediate data set of each partition to a separate one of the reducers .
WO2005013139A1
CLAIM 15
. A contents synchronization system as set forth in claim 1 , wherein it is characterized in that said system (data set, first data set, second data set) keeps contents of all server groups identical with each other in order to provide services according to the objects for multiple server groups clustered through load balancer installed with switching facilities , wherein said server groups have the same objects .

US8190610B2
CLAIM 36
. The map-reduce method of claim 35 , wherein all of the key-value pairs of the first and second intermediate data set (said system) are provided to all of the reducers .
WO2005013139A1
CLAIM 15
. A contents synchronization system as set forth in claim 1 , wherein it is characterized in that said system (data set, first data set, second data set) keeps contents of all server groups identical with each other in order to provide services according to the objects for multiple server groups clustered through load balancer installed with switching facilities , wherein said server groups have the same objects .

US8190610B2
CLAIM 37
. The map-reduce method of claim 35 , wherein at least some of the reducers include a sort , group-by-key and combine task (current status) , the method further comprises generating and providing metadata for at least some of the mapping , partitioning , combining , grouping and sorting .
WO2005013139A1
CLAIM 12
. A contents synchronization system as set forth in claim 1 , wherein it is characterized in that a management function of said service servers makes it possible through said CD Master Admin Tool that a network manager adds a server newly to be a service object of said CD Master and modifies and deletes the environments of existing registered servers ;
an environment setting function of said CMS server and CAS servers makes it possible for said CMS server to set files and folders to be monitored and to set path to store data received from CAS servers installed in other servers ;
a management function of said manager account creates , modifies , deletes the account and information of the manager with that the access to said CD Master Admin Tool is possible ;
said server monitoring function shows the current status (combine task) of registered service servers in forms of graph and table ;
said work log confirmation function makes it possible to confirm all job log about all synchronization jobs , manual jobs , reserved jobs that are performed under control of said CD Master server ;
and said management function is an application of GUI environment that can operate independently from platforms of operating system , and makes it possible to manage network easily and simply .

US8190610B2
CLAIM 40
. A computer system including a plurality of computing devices , the computer system configured to process data of a data set (said system) , wherein the data set comprises a plurality of data groups , the computer system comprises at least one processor and memory that are operable to perform the following operations : for a first data set having a plurality of first key-value pairs , wherein such first data set belongs to a first data (time t) group and the first key-value pairs have a first schema , partitioning the first data set into a plurality of data partitions and providing each data partition to a selected one of a plurality of mapping functions (management function, encryption function) that are each user-configurable to independently output a plurality of lists of values for each of a set of keys found in such map function' ;

s corresponding data partition to form a first intermediate data set having a first set of resulting key-value pairs ;

for a second data set having a plurality of second key-value pairs , wherein such second data set belongs to a second data group and the second key-value pairs have a second schema , partitioning the first data set into a plurality of data partitions and providing each data partition to a selected one of a plurality of mapping functions to form a second intermediate data set having a second set (said server) of resulting key-value pairs differing from the first set of resulting key-value pairs , wherein the first schema differs from the second schema ;

and reducing the first and the second intermediate data set together so as to form an output data set , wherein the reducing is accomplished by iterating on at least one key from the first and second intermediate data set , and the output data set has a different schema than the first and second schema , wherein the first and second schema and the first and second set of resulting key-value pairs have a key in common .
WO2005013139A1
CLAIM 2
. A contents synchronization system as set forth in claim 1 , wherein it is characterized in that said CD Master performing : a data filtering function that includes or excludes data and uses regular expression method , wherein the include helps said CD Master to make a manager specify the kind of data to transmit to target servers and only transmit a specific kind of data to target servers , and the exclude helps CD Master to make a manager exclude a specific kind of data files from the transmission ;
a multi contents generating function ;
a multiple data transmission method function including real time t (first data, first data group) ransmission , manual transmission , reserved transmission ;
a multi data transmission path setting function ;
and a data transmission fail-over function by network failure .

WO2005013139A1
CLAIM 11
. A contents synchronization system as set forth in claim 10 , wherein it is characterized in that said server (second set) monitoring information is got by said SM Agent server and referenced in transmission for synchronization of said CD Master and if failure of a CAS server of contents synchronization path is monitored , said CD Master performs contents synchronization for other CAS servers except the corresponding CAS server and in case the corresponding CAS server , in which the failure has occurred is recovered later , then the synchronization is performed by a CAS server in neighboring other path and in case as a result of monitoring by said SM Agent server , server' ;
s physical problems or software problems including PING failure , failure of each PORT monitoring , Agent response failure , or load of CPU , Memory and Session are monitored , said CD Master server notifies the monitored results to a manager through alarm information , SMS , E-mail by using CD Master Admin Tool and makes the manager check the status of servers and respond rapidly for failures .

WO2005013139A1
CLAIM 12
. A contents synchronization system as set forth in claim 1 , wherein it is characterized in that a management function (mapping functions) of said service servers makes it possible through said CD Master Admin Tool that a network manager adds a server newly to be a service object of said CD Master and modifies and deletes the environments of existing registered servers ;
an environment setting function of said CMS server and CAS servers makes it possible for said CMS server to set files and folders to be monitored and to set path to store data received from CAS servers installed in other servers ;
a management function of said manager account creates , modifies , deletes the account and information of the manager with that the access to said CD Master Admin Tool is possible ;
said server monitoring function shows the current status of registered service servers in forms of graph and table ;
said work log confirmation function makes it possible to confirm all job log about all synchronization jobs , manual jobs , reserved jobs that are performed under control of said CD Master server ;
and said management function is an application of GUI environment that can operate independently from platforms of operating system , and makes it possible to manage network easily and simply .

WO2005013139A1
CLAIM 15
. A contents synchronization system as set forth in claim 1 , wherein it is characterized in that said system (data set, first data set, second data set) keeps contents of all server groups identical with each other in order to provide services according to the objects for multiple server groups clustered through load balancer installed with switching facilities , wherein said server groups have the same objects .

WO2005013139A1
CLAIM 20
. A contents synchronization method as set forth in any one of claims 16 to 18 , wherein it is characterized in that in case of data transmission among the CAS servers compressed data transmission function is performed and in case of contents synchronization the compressed data transmission function compresses and encodes data and reduces network' ;
s load , said file transmission is specified in consideration of characteristics of network structure including International Region that does not belong to Same Region , Same Group , in case of the file transmission the whole files are dump copied but in case of frequently updated files only the changed parts of files are transmitted after comparison of files , wherein the transmission method is called as different patch , contents synchronization is performed by defining whether encryption is used or not by combination of AND conditions , or at the same time by specifying whether SSL encryption is used and whether dump copy or different patch is used , the manager sets predetermined multi-level compression rates in consideration of network bandwidth of each transmission section , the CD Master server supports packet encryption of transmission data using SSL and previously intercepts information leakage through hacking by using encryption function (mapping functions) in order to protect important data and contents of enterprises and persons , and it is possible to define SSL encryption section selectively among the whole sections of source servers and target servers , wherein by reflecting network status of LAN and WAN sections to the maximum and setting , transmission rate increases and data is protected safely .

US8190610B2
CLAIM 41
. The computer system of claim 40 , wherein the reducing is accomplished by iterating on at least one key from each of the first and second intermediate data set (said system) so that the output data set is a merging of a portion of the first and second intermediate data set .
WO2005013139A1
CLAIM 15
. A contents synchronization system as set forth in claim 1 , wherein it is characterized in that said system (data set, first data set, second data set) keeps contents of all server groups identical with each other in order to provide services according to the objects for multiple server groups clustered through load balancer installed with switching facilities , wherein said server groups have the same objects .

US8190610B2
CLAIM 42
. The computer system of claim 40 , wherein the reducing is accomplished by a plurality of reducers , the d the at least one processor and memory further operable for : partitioning the first or second intermediate data set (said system) into a plurality of partitions , at least some of the key-value pairs of the first or second intermediate data being provided to more than one of the partitions ;

and providing the partitioned intermediate data set of each partition to a separate one of the reducers .
WO2005013139A1
CLAIM 15
. A contents synchronization system as set forth in claim 1 , wherein it is characterized in that said system (data set, first data set, second data set) keeps contents of all server groups identical with each other in order to provide services according to the objects for multiple server groups clustered through load balancer installed with switching facilities , wherein said server groups have the same objects .

US8190610B2
CLAIM 43
. The computer system of claim 42 , wherein all of the key-value pairs of the first and second intermediate data set (said system) are provided to all of the reducers .
WO2005013139A1
CLAIM 15
. A contents synchronization system as set forth in claim 1 , wherein it is characterized in that said system (data set, first data set, second data set) keeps contents of all server groups identical with each other in order to provide services according to the objects for multiple server groups clustered through load balancer installed with switching facilities , wherein said server groups have the same objects .

US8190610B2
CLAIM 44
. The computer system of claim 42 , wherein at least some of the reducers include a sort , group-by-key and combine task (current status) , the d the at least one processor and memory are further operable for : generating and providing metadata for at least some of the mapping , partitioning , combining , grouping and sorting .
WO2005013139A1
CLAIM 12
. A contents synchronization system as set forth in claim 1 , wherein it is characterized in that a management function of said service servers makes it possible through said CD Master Admin Tool that a network manager adds a server newly to be a service object of said CD Master and modifies and deletes the environments of existing registered servers ;
an environment setting function of said CMS server and CAS servers makes it possible for said CMS server to set files and folders to be monitored and to set path to store data received from CAS servers installed in other servers ;
a management function of said manager account creates , modifies , deletes the account and information of the manager with that the access to said CD Master Admin Tool is possible ;
said server monitoring function shows the current status (combine task) of registered service servers in forms of graph and table ;
said work log confirmation function makes it possible to confirm all job log about all synchronization jobs , manual jobs , reserved jobs that are performed under control of said CD Master server ;
and said management function is an application of GUI environment that can operate independently from platforms of operating system , and makes it possible to manage network easily and simply .




US8190610B2

Filed: 2006-10-05     Issued: 2012-05-29

MapReduce for distributed database processing

(Original Assignee) Yahoo Inc     (Current Assignee) R2 Solutions LLC ; Altaba Inc

Ali Dasdan, Hung-Chih Yang, Ruey-Lung Hsiao
JP2005099107A

Filed: 2003-09-22     Issued: 2005-04-14

データ再生装置

(Original Assignee) Matsushita Electric Ind Co Ltd; 松下電器産業株式会社     

Hiroto Endo, 博人 遠藤
US8190610B2
CLAIM 1
. A method of processing data of a data set over a distributed system , wherein the data set comprises a plurality of data groups , the method comprising : partitioning the data of each one of the data groups into a plurality of data partitions that each have a plurality of key-value pairs and providing each data partition to a selected one of a plurality of mapping functions that are each user-configurable to independently output a plurality of lists of values for each of a set of keys found in such map function' ;

s corresponding data partition to form corresponding intermediate data for that data group and identifiable to that data group , wherein the data of a first data group has a different schema (メモリ) than the data of a second data group and the data of the first data group is mapped differently than the data of the second data group so that different lists of values are output for the corresponding different intermediate data , wherein the different schema and corresponding different intermediate data have a key in common ;

and reducing the intermediate data for the data groups to at least one output data group , including processing the intermediate data for each data group in a manner that is defined to correspond to that data group , so as to result in a merging of the corresponding different intermediate data based on the key in common , wherein the mapping and reducing operations (動作状態) are performed by a distributed system .
JP2005099107A
CLAIM 1
映像又は音声データをファイルとして記憶するメモリ (different schema) と、 前記データの再生機能と前記データの早送り又は巻き戻し機能とを有するデータ読み出し手段と、 前記再生機能及び前記早送り又は巻き戻し機能を動作させるための入力手段と、 前記ファイルの大きさを認識するファイル認識手段と、 前記データ読み出し手段の動作状態 (reducing operations) を表示するディスプレイを備え、 前記ファイルの大きさに応じて前記早送り又は巻き戻し機能の動作速度を制御することを特徴としたデータ再生装置。

US8190610B2
CLAIM 12
. The method of claim 5 , wherein : the intermediate data includes a plurality of grouped sets of key/value pairs ;

the reducing step is carried out by a plurality of reducers ;

for at least one of the reducers , the iterator corresponding to a particular data group , for that reducer , operates according to a different key of a different schema (メモリ) than the iterator corresponding to another particular data group , for that reducer .
JP2005099107A
CLAIM 1
映像又は音声データをファイルとして記憶するメモリ (different schema) と、 前記データの再生機能と前記データの早送り又は巻き戻し機能とを有するデータ読み出し手段と、 前記再生機能及び前記早送り又は巻き戻し機能を動作させるための入力手段と、 前記ファイルの大きさを認識するファイル認識手段と、 前記データ読み出し手段の動作状態を表示するディスプレイを備え、 前記ファイルの大きさに応じて前記早送り又は巻き戻し機能の動作速度を制御することを特徴としたデータ再生装置。

US8190610B2
CLAIM 17
. A computer system including a plurality of computing devices , the computer system configured to process data of a data set , wherein the data set comprises a plurality of data groups , the computer system comprises at least one processor and memory that are operable to perform the following operations : partitioning the data of each one of the data groups into a plurality of data partitions that each have a plurality of key-value pairs and providing each data partition to a selected one of a plurality of mapping functions that are each user-configurable to independently output a plurality of lists of values for each of a set of keys found in such map function' ;

s corresponding data partition to form corresponding intermediate data for that data group and identifiable to that data group , wherein the data of a first data group has a different schema (メモリ) than the data of a second data group and the data of the first data group is mapped differently than the data of the second data group so that different lists of values are output for the corresponding different intermediate data , wherein the different schema and corresponding different intermediate data have a key in common ;

and reduce the intermediate data for the data groups to at least one output data group , including processing the intermediate data for each data group in a manner that is defined to correspond to that data group so as to result in a merging of the corresponding different intermediate data based on the key in common .
JP2005099107A
CLAIM 1
映像又は音声データをファイルとして記憶するメモリ (different schema) と、 前記データの再生機能と前記データの早送り又は巻き戻し機能とを有するデータ読み出し手段と、 前記再生機能及び前記早送り又は巻き戻し機能を動作させるための入力手段と、 前記ファイルの大きさを認識するファイル認識手段と、 前記データ読み出し手段の動作状態を表示するディスプレイを備え、 前記ファイルの大きさに応じて前記早送り又は巻き戻し機能の動作速度を制御することを特徴としたデータ再生装置。

US8190610B2
CLAIM 22
. The computer system of claim 21 , wherein : the intermediate data includes a plurality of grouped sets of key/value pairs ;

the at least one processor and memory are further operable to reduce the intermediate data via a plurality of reducers ;

for at least one of the reducers , the iterator corresponding to a particular data group , for that reducer , is configured to operate according to a different key of a different schema (メモリ) than the iterator corresponding to another particular data group , for that reducer .
JP2005099107A
CLAIM 1
映像又は音声データをファイルとして記憶するメモリ (different schema) と、 前記データの再生機能と前記データの早送り又は巻き戻し機能とを有するデータ読み出し手段と、 前記再生機能及び前記早送り又は巻き戻し機能を動作させるための入力手段と、 前記ファイルの大きさを認識するファイル認識手段と、 前記データ読み出し手段の動作状態を表示するディスプレイを備え、 前記ファイルの大きさに応じて前記早送り又は巻き戻し機能の動作速度を制御することを特徴としたデータ再生装置。

US8190610B2
CLAIM 33
. A map-reduce method of processing data from a plurality of groups having different schema (メモリ) over a computer system , the method comprising : for a first data set (選択手段) having a plurality of first key-value pairs , wherein such first data set belongs to a first data group and the first key-value pairs have a first schema , partitioning the first data set into a plurality of data partitions and providing each data partition to a selected one of a plurality of mapping functions that are each user-configurable to independently output a plurality of lists of values for each of a set of keys found in such map function' ;

s corresponding data partition to form a first intermediate data set having a first set (選択手段) of resulting key-value pairs ;

for a second data set having a plurality of second key-value pairs , wherein such second data set belongs to a second data group and the second key-value pairs have a second schema , partitioning the first data set into a plurality of data partitions and providing each data partition to a selected one of a plurality of mapping functions to form a second intermediate data set having a second set of resulting key-value pairs differing from the first set of resulting key-value pairs , wherein the first schema differs from the second schema , wherein the first and second schema and the first and second set of resulting key-value pairs have a key in common ;

and reducing the first and the second intermediate data set together so as to form an output data set , wherein the reducing is accomplished by iterating on at least one key from the first and second intermediate data set , and the output data set has a different schema than the first and second schema , wherein the mapping and reducing operations (動作状態) are performed by a distributed system .
JP2005099107A
CLAIM 1
映像又は音声データをファイルとして記憶するメモリ (different schema) と、 前記データの再生機能と前記データの早送り又は巻き戻し機能とを有するデータ読み出し手段と、 前記再生機能及び前記早送り又は巻き戻し機能を動作させるための入力手段と、 前記ファイルの大きさを認識するファイル認識手段と、 前記データ読み出し手段の動作状態 (reducing operations) を表示するディスプレイを備え、 前記ファイルの大きさに応じて前記早送り又は巻き戻し機能の動作速度を制御することを特徴としたデータ再生装置。

JP2005099107A
CLAIM 2
映像又は音声データをファイルとして記憶するメモリと、 前記ファイルを分割する分割手段と、 前記データの再生機能と前記データの早送り又は巻き戻し機能とを有するデータ読み出し手段と、 前記分割手段により分割されたファイルのうち前記データ読み出し手段により再生するファイルを選択するファイル選択手段 (first set, first data set) と、 前記ファイル選択手段により選択されたファイルの大きさを認識するファイル認識手段と、 前記再生機能及び前記早送り又は巻き戻し機能を動作させるための入力手段と、 前記データ読み出し手段の動作状態を表示するディスプレイを備え、 前記分割されたファイルの大きさに応じて前記早送り又は巻き戻し機能の動作速度を制御することを特徴としたデータ再生装置。

US8190610B2
CLAIM 40
. A computer system including a plurality of computing devices , the computer system configured to process data of a data set , wherein the data set comprises a plurality of data groups , the computer system comprises at least one processor and memory that are operable to perform the following operations : for a first data set (選択手段) having a plurality of first key-value pairs , wherein such first data set belongs to a first data group and the first key-value pairs have a first schema , partitioning the first data set into a plurality of data partitions and providing each data partition to a selected one of a plurality of mapping functions that are each user-configurable to independently output a plurality of lists of values for each of a set of keys found in such map function' ;

s corresponding data partition to form a first intermediate data set having a first set (選択手段) of resulting key-value pairs ;

for a second data set having a plurality of second key-value pairs , wherein such second data set belongs to a second data group and the second key-value pairs have a second schema , partitioning the first data set into a plurality of data partitions and providing each data partition to a selected one of a plurality of mapping functions to form a second intermediate data set having a second set of resulting key-value pairs differing from the first set of resulting key-value pairs , wherein the first schema differs from the second schema ;

and reducing the first and the second intermediate data set together so as to form an output data set , wherein the reducing is accomplished by iterating on at least one key from the first and second intermediate data set , and the output data set has a different schema (メモリ) than the first and second schema , wherein the first and second schema and the first and second set of resulting key-value pairs have a key in common .
JP2005099107A
CLAIM 1
映像又は音声データをファイルとして記憶するメモリ (different schema) と、 前記データの再生機能と前記データの早送り又は巻き戻し機能とを有するデータ読み出し手段と、 前記再生機能及び前記早送り又は巻き戻し機能を動作させるための入力手段と、 前記ファイルの大きさを認識するファイル認識手段と、 前記データ読み出し手段の動作状態を表示するディスプレイを備え、 前記ファイルの大きさに応じて前記早送り又は巻き戻し機能の動作速度を制御することを特徴としたデータ再生装置。

JP2005099107A
CLAIM 2
映像又は音声データをファイルとして記憶するメモリと、 前記ファイルを分割する分割手段と、 前記データの再生機能と前記データの早送り又は巻き戻し機能とを有するデータ読み出し手段と、 前記分割手段により分割されたファイルのうち前記データ読み出し手段により再生するファイルを選択するファイル選択手段 (first set, first data set) と、 前記ファイル選択手段により選択されたファイルの大きさを認識するファイル認識手段と、 前記再生機能及び前記早送り又は巻き戻し機能を動作させるための入力手段と、 前記データ読み出し手段の動作状態を表示するディスプレイを備え、 前記分割されたファイルの大きさに応じて前記早送り又は巻き戻し機能の動作速度を制御することを特徴としたデータ再生装置。




US8190610B2

Filed: 2006-10-05     Issued: 2012-05-29

MapReduce for distributed database processing

(Original Assignee) Yahoo Inc     (Current Assignee) R2 Solutions LLC ; Altaba Inc

Ali Dasdan, Hung-Chih Yang, Ruey-Lung Hsiao
JP2005092707A

Filed: 2003-09-19     Issued: 2005-04-07

類似度算出システムおよび類似度算出プログラム、並びに類似度算出方法

(Original Assignee) Seiko Epson Corp; セイコーエプソン株式会社     

Atsuji Nagahara, Hirotaka Ohashi, 洋貴 大橋, 敦示 永原
US8190610B2
CLAIM 1
. A method of processing data (コンピュ) of a data set over a distributed system , wherein the data set comprises a plurality of data groups , the method comprising : partitioning the data of each one of the data groups into a plurality of data partitions that each have a plurality of key-value pairs and providing each data partition to a selected one of a plurality of mapping functions that are each user-configurable to independently output a plurality of lists of values for each of a set of keys found in such map function' ;

s corresponding data partition to form corresponding intermediate data for that data group (えること) and identifiable to that data group , wherein the data of a first data group has a different schema than the data of a second data group and the data of the first data group is mapped differently than the data of the second data group so that different lists of values are output for the corresponding different intermediate data (のプログラム) , wherein the different schema and corresponding different intermediate data have a key in common ;

and reducing the intermediate data for the data groups to at least one output data group , including processing the intermediate data for each data group in a manner that is defined to correspond to that data group , so as to result in a merging of the corresponding different intermediate data based on the key in common , wherein the mapping and reducing operations are performed by a distributed system .
JP2005092707A
CLAIM 1
情報格納枠を当該情報格納枠のレイアウト属性と対応付けて所定のレイアウト領域に配置したレイアウト結果の類似度を算出するシステムであって、 比較先となる前記レイアウト結果のレイアウト属性および比較元となる前記レイアウト結果のレイアウト属性に基づいて前記類似度を算出するレイアウト結果類似度算出手段を備えること (data group, first data group) を特徴とする類似度算出システム。

JP2005092707A
CLAIM 6
情報格納枠を当該情報格納枠のレイアウト属性と対応付けて所定のレイアウト領域に配置したレイアウト結果の類似度を算出するプログラムであって、 比較先となる前記レイアウト結果のレイアウト属性および比較元となる前記レイアウト結果のレイアウト属性に基づいて前記類似度を算出するレイアウト結果類似度算出手段として実現される処理をコンピュ (processing data) ータに実行させるためのプログラム (corresponding different intermediate data) であることを特徴とする類似度算出プログラム。

US8190610B2
CLAIM 2
. The method of claim 1 , wherein : the at least one output data group (えること) is a plurality of output data groups .
JP2005092707A
CLAIM 1
情報格納枠を当該情報格納枠のレイアウト属性と対応付けて所定のレイアウト領域に配置したレイアウト結果の類似度を算出するシステムであって、 比較先となる前記レイアウト結果のレイアウト属性および比較元となる前記レイアウト結果のレイアウト属性に基づいて前記類似度を算出するレイアウト結果類似度算出手段を備えること (data group, first data group) を特徴とする類似度算出システム。

US8190610B2
CLAIM 3
. The method of claim 1 , wherein : corresponding intermediate data for a data group (えること) being identifiable to the data group includes the corresponding intermediate data being stored such that metadata associated with a manner of storage is identifiable to the data group .
JP2005092707A
CLAIM 1
情報格納枠を当該情報格納枠のレイアウト属性と対応付けて所定のレイアウト領域に配置したレイアウト結果の類似度を算出するシステムであって、 比較先となる前記レイアウト結果のレイアウト属性および比較元となる前記レイアウト結果のレイアウト属性に基づいて前記類似度を算出するレイアウト結果類似度算出手段を備えること (data group, first data group) を特徴とする類似度算出システム。

US8190610B2
CLAIM 4
. The method of claim 1 , wherein : corresponding intermediate data for a data group (えること) being identifiable to the data group includes the corresponding intermediate data being stored in association with data that is identifiable to the data group .
JP2005092707A
CLAIM 1
情報格納枠を当該情報格納枠のレイアウト属性と対応付けて所定のレイアウト領域に配置したレイアウト結果の類似度を算出するシステムであって、 比較先となる前記レイアウト結果のレイアウト属性および比較元となる前記レイアウト結果のレイアウト属性に基づいて前記類似度を算出するレイアウト結果類似度算出手段を備えること (data group, first data group) を特徴とする類似度算出システム。

US8190610B2
CLAIM 5
. The method of claim 1 , wherein : processing the intermediate data for each data group (えること) in a manner that is defined to correspond to that data group includes , for each data group , employing an iterator that corresponds to that data group .
JP2005092707A
CLAIM 1
情報格納枠を当該情報格納枠のレイアウト属性と対応付けて所定のレイアウト領域に配置したレイアウト結果の類似度を算出するシステムであって、 比較先となる前記レイアウト結果のレイアウト属性および比較元となる前記レイアウト結果のレイアウト属性に基づいて前記類似度を算出するレイアウト結果類似度算出手段を備えること (data group, first data group) を特徴とする類似度算出システム。

US8190610B2
CLAIM 11
. The method of claim 10 , wherein : processing the intermediate data for each data group (えること) in a manner that is defined to correspond to that data group includes , for each data group , employing an iterator that corresponds to that data group , wherein the iterator includes providing the associated metadata to the processing of the reducing step .
JP2005092707A
CLAIM 1
情報格納枠を当該情報格納枠のレイアウト属性と対応付けて所定のレイアウト領域に配置したレイアウト結果の類似度を算出するシステムであって、 比較先となる前記レイアウト結果のレイアウト属性および比較元となる前記レイアウト結果のレイアウト属性に基づいて前記類似度を算出するレイアウト結果類似度算出手段を備えること (data group, first data group) を特徴とする類似度算出システム。

US8190610B2
CLAIM 12
. The method of claim 5 , wherein : the intermediate data includes a plurality of grouped sets of key/value pairs ;

the reducing step is carried out by a plurality of reducers ;

for at least one of the reducers , the iterator corresponding to a particular data group (えること) , for that reducer , operates according to a different key of a different schema than the iterator corresponding to another particular data group , for that reducer .
JP2005092707A
CLAIM 1
情報格納枠を当該情報格納枠のレイアウト属性と対応付けて所定のレイアウト領域に配置したレイアウト結果の類似度を算出するシステムであって、 比較先となる前記レイアウト結果のレイアウト属性および比較元となる前記レイアウト結果のレイアウト属性に基づいて前記類似度を算出するレイアウト結果類似度算出手段を備えること (data group, first data group) を特徴とする類似度算出システム。

US8190610B2
CLAIM 13
. The method of claim 1 , wherein : the intermediate data processing step of the reducing step further comprises processing data (コンピュ) that is not intermediate data .
JP2005092707A
CLAIM 6
情報格納枠を当該情報格納枠のレイアウト属性と対応付けて所定のレイアウト領域に配置したレイアウト結果の類似度を算出するプログラムであって、 比較先となる前記レイアウト結果のレイアウト属性および比較元となる前記レイアウト結果のレイアウト属性に基づいて前記類似度を算出するレイアウト結果類似度算出手段として実現される処理をコンピュ (processing data) ータに実行させるためのプログラムであることを特徴とする類似度算出プログラム。

US8190610B2
CLAIM 17
. A computer system including a plurality of computing devices , the computer system configured to process data of a data set , wherein the data set comprises a plurality of data groups , the computer system comprises at least one processor and memory that are operable to perform the following operations : partitioning the data of each one of the data groups into a plurality of data partitions that each have a plurality of key-value pairs and providing each data partition to a selected one of a plurality of mapping functions that are each user-configurable to independently output a plurality of lists of values for each of a set of keys found in such map function' ;

s corresponding data partition to form corresponding intermediate data for that data group (えること) and identifiable to that data group , wherein the data of a first data group has a different schema than the data of a second data group and the data of the first data group is mapped differently than the data of the second data group so that different lists of values are output for the corresponding different intermediate data (のプログラム) , wherein the different schema and corresponding different intermediate data have a key in common ;

and reduce the intermediate data for the data groups to at least one output data group , including processing the intermediate data for each data group in a manner that is defined to correspond to that data group so as to result in a merging of the corresponding different intermediate data based on the key in common .
JP2005092707A
CLAIM 1
情報格納枠を当該情報格納枠のレイアウト属性と対応付けて所定のレイアウト領域に配置したレイアウト結果の類似度を算出するシステムであって、 比較先となる前記レイアウト結果のレイアウト属性および比較元となる前記レイアウト結果のレイアウト属性に基づいて前記類似度を算出するレイアウト結果類似度算出手段を備えること (data group, first data group) を特徴とする類似度算出システム。

JP2005092707A
CLAIM 6
情報格納枠を当該情報格納枠のレイアウト属性と対応付けて所定のレイアウト領域に配置したレイアウト結果の類似度を算出するプログラムであって、 比較先となる前記レイアウト結果のレイアウト属性および比較元となる前記レイアウト結果のレイアウト属性に基づいて前記類似度を算出するレイアウト結果類似度算出手段として実現される処理をコンピュータに実行させるためのプログラム (corresponding different intermediate data) であることを特徴とする類似度算出プログラム。

US8190610B2
CLAIM 18
. The computer system of claim 17 , wherein : the at least one output data group (えること) is a plurality of output data groups .
JP2005092707A
CLAIM 1
情報格納枠を当該情報格納枠のレイアウト属性と対応付けて所定のレイアウト領域に配置したレイアウト結果の類似度を算出するシステムであって、 比較先となる前記レイアウト結果のレイアウト属性および比較元となる前記レイアウト結果のレイアウト属性に基づいて前記類似度を算出するレイアウト結果類似度算出手段を備えること (data group, first data group) を特徴とする類似度算出システム。

US8190610B2
CLAIM 19
. The computer system of claim 17 , wherein : corresponding intermediate data for a data group (えること) being identifiable to the data group includes the corresponding intermediate data being stored such that metadata associated with the manner of storage is identifiable to the data group .
JP2005092707A
CLAIM 1
情報格納枠を当該情報格納枠のレイアウト属性と対応付けて所定のレイアウト領域に配置したレイアウト結果の類似度を算出するシステムであって、 比較先となる前記レイアウト結果のレイアウト属性および比較元となる前記レイアウト結果のレイアウト属性に基づいて前記類似度を算出するレイアウト結果類似度算出手段を備えること (data group, first data group) を特徴とする類似度算出システム。

US8190610B2
CLAIM 20
. The computer system of claim 19 , wherein : corresponding intermediate data for a data group (えること) being identifiable to the data group includes the computer system being configured to store the corresponding intermediate data in association with data that is identifiable to the data group .
JP2005092707A
CLAIM 1
情報格納枠を当該情報格納枠のレイアウト属性と対応付けて所定のレイアウト領域に配置したレイアウト結果の類似度を算出するシステムであって、 比較先となる前記レイアウト結果のレイアウト属性および比較元となる前記レイアウト結果のレイアウト属性に基づいて前記類似度を算出するレイアウト結果類似度算出手段を備えること (data group, first data group) を特徴とする類似度算出システム。

US8190610B2
CLAIM 21
. The computer system of claim 17 , wherein : the at least one processor and memory being configured to process the intermediate data for each data group (えること) in a manner that is defined to correspond to that data group includes , for each data group , the at least one processor and memory being configured to employ an iterator that corresponds to that data group .
JP2005092707A
CLAIM 1
情報格納枠を当該情報格納枠のレイアウト属性と対応付けて所定のレイアウト領域に配置したレイアウト結果の類似度を算出するシステムであって、 比較先となる前記レイアウト結果のレイアウト属性および比較元となる前記レイアウト結果のレイアウト属性に基づいて前記類似度を算出するレイアウト結果類似度算出手段を備えること (data group, first data group) を特徴とする類似度算出システム。

US8190610B2
CLAIM 22
. The computer system of claim 21 , wherein : the intermediate data includes a plurality of grouped sets of key/value pairs ;

the at least one processor and memory are further operable to reduce the intermediate data via a plurality of reducers ;

for at least one of the reducers , the iterator corresponding to a particular data group (えること) , for that reducer , is configured to operate according to a different key of a different schema than the iterator corresponding to another particular data group , for that reducer .
JP2005092707A
CLAIM 1
情報格納枠を当該情報格納枠のレイアウト属性と対応付けて所定のレイアウト領域に配置したレイアウト結果の類似度を算出するシステムであって、 比較先となる前記レイアウト結果のレイアウト属性および比較元となる前記レイアウト結果のレイアウト属性に基づいて前記類似度を算出するレイアウト結果類似度算出手段を備えること (data group, first data group) を特徴とする類似度算出システム。

US8190610B2
CLAIM 28
. The computer system of claim 27 , wherein : the at least one processor and memory being further operable to process the intermediate data for each data group (えること) in a manner that is defined to correspond to that data group includes , for each data group , the at least one processor and memory being further operable to employ an iterator that corresponds to that data group , wherein the iterator is configured to provide the associated metadata to the reducing .
JP2005092707A
CLAIM 1
情報格納枠を当該情報格納枠のレイアウト属性と対応付けて所定のレイアウト領域に配置したレイアウト結果の類似度を算出するシステムであって、 比較先となる前記レイアウト結果のレイアウト属性および比較元となる前記レイアウト結果のレイアウト属性に基づいて前記類似度を算出するレイアウト結果類似度算出手段を備えること (data group, first data group) を特徴とする類似度算出システム。

US8190610B2
CLAIM 29
. The computer system of claim 17 , wherein : the intermediate data processing of the reducing further comprises processing data (コンピュ) that is not intermediate data .
JP2005092707A
CLAIM 6
情報格納枠を当該情報格納枠のレイアウト属性と対応付けて所定のレイアウト領域に配置したレイアウト結果の類似度を算出するプログラムであって、 比較先となる前記レイアウト結果のレイアウト属性および比較元となる前記レイアウト結果のレイアウト属性に基づいて前記類似度を算出するレイアウト結果類似度算出手段として実現される処理をコンピュ (processing data) ータに実行させるためのプログラムであることを特徴とする類似度算出プログラム。

US8190610B2
CLAIM 33
. A map-reduce method of processing data (コンピュ) from a plurality of groups having different schema over a computer system , the method comprising : for a first data set having a plurality of first key-value pairs , wherein such first data set belongs to a first data group (えること) and the first key-value pairs have a first schema , partitioning the first data set into a plurality of data partitions and providing each data partition to a selected one of a plurality of mapping functions that are each user-configurable to independently output a plurality of lists of values for each of a set of keys found in such map function' ;

s corresponding data partition to form a first intermediate data set having a first set of resulting key-value pairs ;

for a second data set having a plurality of second key-value pairs , wherein such second data set belongs to a second data group and the second key-value pairs have a second schema , partitioning the first data set into a plurality of data partitions and providing each data partition to a selected one of a plurality of mapping functions to form a second intermediate data set having a second set of resulting key-value pairs differing from the first set of resulting key-value pairs , wherein the first schema differs from the second schema , wherein the first and second schema and the first and second set of resulting key-value pairs have a key in common ;

and reducing the first and the second intermediate data set together so as to form an output data set , wherein the reducing is accomplished by iterating on at least one key from the first and second intermediate data set , and the output data set has a different schema than the first and second schema , wherein the mapping and reducing operations are performed by a distributed system .
JP2005092707A
CLAIM 1
情報格納枠を当該情報格納枠のレイアウト属性と対応付けて所定のレイアウト領域に配置したレイアウト結果の類似度を算出するシステムであって、 比較先となる前記レイアウト結果のレイアウト属性および比較元となる前記レイアウト結果のレイアウト属性に基づいて前記類似度を算出するレイアウト結果類似度算出手段を備えること (data group, first data group) を特徴とする類似度算出システム。

JP2005092707A
CLAIM 6
情報格納枠を当該情報格納枠のレイアウト属性と対応付けて所定のレイアウト領域に配置したレイアウト結果の類似度を算出するプログラムであって、 比較先となる前記レイアウト結果のレイアウト属性および比較元となる前記レイアウト結果のレイアウト属性に基づいて前記類似度を算出するレイアウト結果類似度算出手段として実現される処理をコンピュ (processing data) ータに実行させるためのプログラムであることを特徴とする類似度算出プログラム。

US8190610B2
CLAIM 40
. A computer system including a plurality of computing devices , the computer system configured to process data of a data set , wherein the data set comprises a plurality of data groups , the computer system comprises at least one processor and memory that are operable to perform the following operations : for a first data set having a plurality of first key-value pairs , wherein such first data set belongs to a first data group (えること) and the first key-value pairs have a first schema , partitioning the first data set into a plurality of data partitions and providing each data partition to a selected one of a plurality of mapping functions that are each user-configurable to independently output a plurality of lists of values for each of a set of keys found in such map function' ;

s corresponding data partition to form a first intermediate data set having a first set of resulting key-value pairs ;

for a second data set having a plurality of second key-value pairs , wherein such second data set belongs to a second data group and the second key-value pairs have a second schema , partitioning the first data set into a plurality of data partitions and providing each data partition to a selected one of a plurality of mapping functions to form a second intermediate data set having a second set of resulting key-value pairs differing from the first set of resulting key-value pairs , wherein the first schema differs from the second schema ;

and reducing the first and the second intermediate data set together so as to form an output data set , wherein the reducing is accomplished by iterating on at least one key from the first and second intermediate data set , and the output data set has a different schema than the first and second schema , wherein the first and second schema and the first and second set of resulting key-value pairs have a key in common .
JP2005092707A
CLAIM 1
情報格納枠を当該情報格納枠のレイアウト属性と対応付けて所定のレイアウト領域に配置したレイアウト結果の類似度を算出するシステムであって、 比較先となる前記レイアウト結果のレイアウト属性および比較元となる前記レイアウト結果のレイアウト属性に基づいて前記類似度を算出するレイアウト結果類似度算出手段を備えること (data group, first data group) を特徴とする類似度算出システム。




US8190610B2

Filed: 2006-10-05     Issued: 2012-05-29

MapReduce for distributed database processing

(Original Assignee) Yahoo Inc     (Current Assignee) R2 Solutions LLC ; Altaba Inc

Ali Dasdan, Hung-Chih Yang, Ruey-Lung Hsiao
US20040117345A1

Filed: 2003-09-17     Issued: 2004-06-17

Ownership reassignment in a shared-nothing database system

(Original Assignee) Oracle International Corp     (Current Assignee) Oracle International Corp

Roger Bamford, Sashikanth Chandrasekaran, Angelo Pruscino
US8190610B2
CLAIM 1
. A method of processing data of a data set over a distributed system , wherein the data set comprises a plurality of data groups , the method comprising : partitioning the data of each one of the data groups into a plurality of data partitions (based partitioning) that each have a plurality of key-value pairs and providing each data partition to a selected one of a plurality of mapping functions that are each user-configurable to independently output a plurality of lists of values for each of a set of keys found in such map function' ;

s corresponding data partition to form corresponding intermediate data for that data group and identifiable to that data group , wherein the data of a first data (first data, time t) group has a different schema (persistent data) than the data of a second data group and the data of the first data group is mapped differently than the data of the second data group so that different lists of values are output for the corresponding different intermediate data , wherein the different schema and corresponding different intermediate data have a key in common ;

and reducing the intermediate data for the data groups to at least one output data group , including processing the intermediate data for each data group in a manner that is defined to correspond to that data group , so as to result in a merging of the corresponding different intermediate data based on the key in common , wherein the mapping and reducing operations are performed by a distributed system .
US20040117345A1
CLAIM 1
. A method for managing data , the method comprising the steps of : maintaining a plurality of persistent data (different schema) items on persistent storage accessible to a plurality of nodes , the persistent data items including a particular data item stored at a particular location on said persistent storage ;
assigning exclusive ownership of each of the persistent data items to one of the nodes , wherein a particular node of said plurality of nodes is assigned exclusive ownership of said particular data item ;
when any node wants an operation performed that involves said particular data item , the node that desires the operation to be performed ships the operation to the particular node for the particular node to perform the operation on the particular data item as said particular data item resides at said particular location ;
while the first node continues to operate , reassigning ownership of the particular data item from the particular node to another node without moving the particular data item from said particular location on said persistent storage ;
after the reassignment , when any node wants an operation performed that involves said particular data item , the node that desires the operation to be performed ships the operation to said other node for the other node to perform the operation on the particular data item as said particular data item resides at said particular location .

US20040117345A1
CLAIM 23
. The method of claim 1 wherein : an operation that involves said particular data item is in-progress at the time t (first data, first data group) he transfer of ownership of said particular data item is to be performed ;
the method further includes the step of determining whether to wait for said in-progress operation to complete based on a set of one or more factors ;
and if it is determined to not wait for said in-progress operation to complete , aborting said in-progress operation .

US20040117345A1
CLAIM 28
. The method of claim 25 wherein the step of assigning each data item to one of a plurality of buckets is performed using range-based partitioning (data partitions, partitioning step, combine task) .

US20040117345A1
CLAIM 67
. A method for use in a multi-node shared-nothing database system , the method comprising the steps of : a first node of said multi-node shared-nothing database system initially functioning as exclusive owner of a first data (first data, first data group) item and a second data item , wherein said first data item and said second data item are persistently stored data items within a database managed by the multi-node shared-nothing database system ;
without changing the location of a first data item on persistent storage or shutting down said first node , reassigning ownership of the first data item from the first node to a second node of said multi-node shared-nothing database system ;
and after reassigning ownership , the first node continuing to operate as the owner of the second data item , and to handle all requests for operations on said second data item .

US8190610B2
CLAIM 8
. The method of claim 7 , wherein : in the partitioning step (based partitioning) , all of the key/value pairs of the intermediate data are provided to all of the partitions .
US20040117345A1
CLAIM 28
. The method of claim 25 wherein the step of assigning each data item to one of a plurality of buckets is performed using range-based partitioning (data partitions, partitioning step, combine task) .

US8190610B2
CLAIM 9
. The method of claim 7 , wherein : at least some of the reducers include a sort , group-by-key and combine task (based partitioning) ;

the method further comprises generating and providing metadata for at least some of the mapping , partitioning , combining , grouping and sorting .
US20040117345A1
CLAIM 28
. The method of claim 25 wherein the step of assigning each data item to one of a plurality of buckets is performed using range-based partitioning (data partitions, partitioning step, combine task) .

US8190610B2
CLAIM 12
. The method of claim 5 , wherein : the intermediate data includes a plurality of grouped sets of key/value pairs ;

the reducing step is carried out by a plurality of reducers ;

for at least one of the reducers , the iterator corresponding to a particular data group , for that reducer , operates according to a different key of a different schema (persistent data) than the iterator corresponding to another particular data group , for that reducer .
US20040117345A1
CLAIM 1
. A method for managing data , the method comprising the steps of : maintaining a plurality of persistent data (different schema) items on persistent storage accessible to a plurality of nodes , the persistent data items including a particular data item stored at a particular location on said persistent storage ;
assigning exclusive ownership of each of the persistent data items to one of the nodes , wherein a particular node of said plurality of nodes is assigned exclusive ownership of said particular data item ;
when any node wants an operation performed that involves said particular data item , the node that desires the operation to be performed ships the operation to the particular node for the particular node to perform the operation on the particular data item as said particular data item resides at said particular location ;
while the first node continues to operate , reassigning ownership of the particular data item from the particular node to another node without moving the particular data item from said particular location on said persistent storage ;
after the reassignment , when any node wants an operation performed that involves said particular data item , the node that desires the operation to be performed ships the operation to said other node for the other node to perform the operation on the particular data item as said particular data item resides at said particular location .

US8190610B2
CLAIM 17
. A computer system (stored data) including a plurality of computing devices , the computer system configured to process data of a data set , wherein the data set comprises a plurality of data groups , the computer system comprises at least one processor and memory that are operable to perform the following operations : partitioning the data of each one of the data groups into a plurality of data partitions (based partitioning) that each have a plurality of key-value pairs and providing each data partition to a selected one of a plurality of mapping functions that are each user-configurable to independently output a plurality of lists of values for each of a set of keys found in such map function' ;

s corresponding data partition to form corresponding intermediate data for that data group and identifiable to that data group , wherein the data of a first data (first data, time t) group has a different schema (persistent data) than the data of a second data group and the data of the first data group is mapped differently than the data of the second data group so that different lists of values are output for the corresponding different intermediate data , wherein the different schema and corresponding different intermediate data have a key in common ;

and reduce the intermediate data for the data groups to at least one output data group , including processing the intermediate data for each data group in a manner that is defined to correspond to that data group so as to result in a merging of the corresponding different intermediate data based on the key in common .
US20040117345A1
CLAIM 1
. A method for managing data , the method comprising the steps of : maintaining a plurality of persistent data (different schema) items on persistent storage accessible to a plurality of nodes , the persistent data items including a particular data item stored at a particular location on said persistent storage ;
assigning exclusive ownership of each of the persistent data items to one of the nodes , wherein a particular node of said plurality of nodes is assigned exclusive ownership of said particular data item ;
when any node wants an operation performed that involves said particular data item , the node that desires the operation to be performed ships the operation to the particular node for the particular node to perform the operation on the particular data item as said particular data item resides at said particular location ;
while the first node continues to operate , reassigning ownership of the particular data item from the particular node to another node without moving the particular data item from said particular location on said persistent storage ;
after the reassignment , when any node wants an operation performed that involves said particular data item , the node that desires the operation to be performed ships the operation to said other node for the other node to perform the operation on the particular data item as said particular data item resides at said particular location .

US20040117345A1
CLAIM 23
. The method of claim 1 wherein : an operation that involves said particular data item is in-progress at the time t (first data, first data group) he transfer of ownership of said particular data item is to be performed ;
the method further includes the step of determining whether to wait for said in-progress operation to complete based on a set of one or more factors ;
and if it is determined to not wait for said in-progress operation to complete , aborting said in-progress operation .

US20040117345A1
CLAIM 28
. The method of claim 25 wherein the step of assigning each data item to one of a plurality of buckets is performed using range-based partitioning (data partitions, partitioning step, combine task) .

US20040117345A1
CLAIM 67
. A method for use in a multi-node shared-nothing database system , the method comprising the steps of : a first node of said multi-node shared-nothing database system initially functioning as exclusive owner of a first data (first data, first data group) item and a second data item , wherein said first data item and said second data item are persistently stored data (computer system) items within a database managed by the multi-node shared-nothing database system ;
without changing the location of a first data item on persistent storage or shutting down said first node , reassigning ownership of the first data item from the first node to a second node of said multi-node shared-nothing database system ;
and after reassigning ownership , the first node continuing to operate as the owner of the second data item , and to handle all requests for operations on said second data item .

US8190610B2
CLAIM 18
. The computer system (stored data) of claim 17 , wherein : the at least one output data group is a plurality of output data groups .
US20040117345A1
CLAIM 67
. A method for use in a multi-node shared-nothing database system , the method comprising the steps of : a first node of said multi-node shared-nothing database system initially functioning as exclusive owner of a first data item and a second data item , wherein said first data item and said second data item are persistently stored data (computer system) items within a database managed by the multi-node shared-nothing database system ;
without changing the location of a first data item on persistent storage or shutting down said first node , reassigning ownership of the first data item from the first node to a second node of said multi-node shared-nothing database system ;
and after reassigning ownership , the first node continuing to operate as the owner of the second data item , and to handle all requests for operations on said second data item .

US8190610B2
CLAIM 19
. The computer system (stored data) of claim 17 , wherein : corresponding intermediate data for a data group being identifiable to the data group includes the corresponding intermediate data being stored such that metadata associated with the manner of storage is identifiable to the data group .
US20040117345A1
CLAIM 67
. A method for use in a multi-node shared-nothing database system , the method comprising the steps of : a first node of said multi-node shared-nothing database system initially functioning as exclusive owner of a first data item and a second data item , wherein said first data item and said second data item are persistently stored data (computer system) items within a database managed by the multi-node shared-nothing database system ;
without changing the location of a first data item on persistent storage or shutting down said first node , reassigning ownership of the first data item from the first node to a second node of said multi-node shared-nothing database system ;
and after reassigning ownership , the first node continuing to operate as the owner of the second data item , and to handle all requests for operations on said second data item .

US8190610B2
CLAIM 20
. The computer system (stored data) of claim 19 , wherein : corresponding intermediate data for a data group being identifiable to the data group includes the computer system being configured to store the corresponding intermediate data in association with data that is identifiable to the data group .
US20040117345A1
CLAIM 67
. A method for use in a multi-node shared-nothing database system , the method comprising the steps of : a first node of said multi-node shared-nothing database system initially functioning as exclusive owner of a first data item and a second data item , wherein said first data item and said second data item are persistently stored data (computer system) items within a database managed by the multi-node shared-nothing database system ;
without changing the location of a first data item on persistent storage or shutting down said first node , reassigning ownership of the first data item from the first node to a second node of said multi-node shared-nothing database system ;
and after reassigning ownership , the first node continuing to operate as the owner of the second data item , and to handle all requests for operations on said second data item .

US8190610B2
CLAIM 21
. The computer system (stored data) of claim 17 , wherein : the at least one processor and memory being configured to process the intermediate data for each data group in a manner that is defined to correspond to that data group includes , for each data group , the at least one processor and memory being configured to employ an iterator that corresponds to that data group .
US20040117345A1
CLAIM 67
. A method for use in a multi-node shared-nothing database system , the method comprising the steps of : a first node of said multi-node shared-nothing database system initially functioning as exclusive owner of a first data item and a second data item , wherein said first data item and said second data item are persistently stored data (computer system) items within a database managed by the multi-node shared-nothing database system ;
without changing the location of a first data item on persistent storage or shutting down said first node , reassigning ownership of the first data item from the first node to a second node of said multi-node shared-nothing database system ;
and after reassigning ownership , the first node continuing to operate as the owner of the second data item , and to handle all requests for operations on said second data item .

US8190610B2
CLAIM 22
. The computer system (stored data) of claim 21 , wherein : the intermediate data includes a plurality of grouped sets of key/value pairs ;

the at least one processor and memory are further operable to reduce the intermediate data via a plurality of reducers ;

for at least one of the reducers , the iterator corresponding to a particular data group , for that reducer , is configured to operate according to a different key of a different schema (persistent data) than the iterator corresponding to another particular data group , for that reducer .
US20040117345A1
CLAIM 1
. A method for managing data , the method comprising the steps of : maintaining a plurality of persistent data (different schema) items on persistent storage accessible to a plurality of nodes , the persistent data items including a particular data item stored at a particular location on said persistent storage ;
assigning exclusive ownership of each of the persistent data items to one of the nodes , wherein a particular node of said plurality of nodes is assigned exclusive ownership of said particular data item ;
when any node wants an operation performed that involves said particular data item , the node that desires the operation to be performed ships the operation to the particular node for the particular node to perform the operation on the particular data item as said particular data item resides at said particular location ;
while the first node continues to operate , reassigning ownership of the particular data item from the particular node to another node without moving the particular data item from said particular location on said persistent storage ;
after the reassignment , when any node wants an operation performed that involves said particular data item , the node that desires the operation to be performed ships the operation to said other node for the other node to perform the operation on the particular data item as said particular data item resides at said particular location .

US20040117345A1
CLAIM 67
. A method for use in a multi-node shared-nothing database system , the method comprising the steps of : a first node of said multi-node shared-nothing database system initially functioning as exclusive owner of a first data item and a second data item , wherein said first data item and said second data item are persistently stored data (computer system) items within a database managed by the multi-node shared-nothing database system ;
without changing the location of a first data item on persistent storage or shutting down said first node , reassigning ownership of the first data item from the first node to a second node of said multi-node shared-nothing database system ;
and after reassigning ownership , the first node continuing to operate as the owner of the second data item , and to handle all requests for operations on said second data item .

US8190610B2
CLAIM 23
. The computer system (stored data) of claim 17 , wherein : the intermediate data includes a plurality of grouped sets of key/value pairs ;

the at least one processor and memory are further operable to reduce the intermediate data via a plurality of reducers ;

and the at least one processor and memory are further operable to : partition the intermediate data into a plurality of partitions , to provide each key/value pair of the intermediate data to a separate one of the partitions ;

and provide the intermediate data of each partition to a separate one of the reducers .
US20040117345A1
CLAIM 67
. A method for use in a multi-node shared-nothing database system , the method comprising the steps of : a first node of said multi-node shared-nothing database system initially functioning as exclusive owner of a first data item and a second data item , wherein said first data item and said second data item are persistently stored data (computer system) items within a database managed by the multi-node shared-nothing database system ;
without changing the location of a first data item on persistent storage or shutting down said first node , reassigning ownership of the first data item from the first node to a second node of said multi-node shared-nothing database system ;
and after reassigning ownership , the first node continuing to operate as the owner of the second data item , and to handle all requests for operations on said second data item .

US8190610B2
CLAIM 24
. The computer system (stored data) of claim 17 , wherein : the intermediate data includes a plurality of grouped sets of key/value pairs ;

the at least one processor and memory are further operable to reduce the intermediate data via a plurality of reducers ;

and the at least one processor and memory are further operable to : partition the intermediate data into a plurality of partitions , to provide at least some of the key/value pairs of the intermediate data to more than one of the partitions ;

and provide the intermediate data of each partition to a separate one of the reducers .
US20040117345A1
CLAIM 67
. A method for use in a multi-node shared-nothing database system , the method comprising the steps of : a first node of said multi-node shared-nothing database system initially functioning as exclusive owner of a first data item and a second data item , wherein said first data item and said second data item are persistently stored data (computer system) items within a database managed by the multi-node shared-nothing database system ;
without changing the location of a first data item on persistent storage or shutting down said first node , reassigning ownership of the first data item from the first node to a second node of said multi-node shared-nothing database system ;
and after reassigning ownership , the first node continuing to operate as the owner of the second data item , and to handle all requests for operations on said second data item .

US8190610B2
CLAIM 25
. The computer system (stored data) of claim 24 , wherein : the at least one processor and memory are further operable to , in partitioning , to provide all of the key/value pairs of the intermediate data to all of the partitions .
US20040117345A1
CLAIM 67
. A method for use in a multi-node shared-nothing database system , the method comprising the steps of : a first node of said multi-node shared-nothing database system initially functioning as exclusive owner of a first data item and a second data item , wherein said first data item and said second data item are persistently stored data (computer system) items within a database managed by the multi-node shared-nothing database system ;
without changing the location of a first data item on persistent storage or shutting down said first node , reassigning ownership of the first data item from the first node to a second node of said multi-node shared-nothing database system ;
and after reassigning ownership , the first node continuing to operate as the owner of the second data item , and to handle all requests for operations on said second data item .

US8190610B2
CLAIM 26
. The computer system (stored data) of claim 24 , wherein : at least some of the reducers include a sort , group-by-key and combine task (based partitioning) ;

the at least one processor and memory are further operable to generate and provide metadata for at least some of the mapping , partitioning , combining , grouping and sorting .
US20040117345A1
CLAIM 28
. The method of claim 25 wherein the step of assigning each data item to one of a plurality of buckets is performed using range-based partitioning (data partitions, partitioning step, combine task) .

US20040117345A1
CLAIM 67
. A method for use in a multi-node shared-nothing database system , the method comprising the steps of : a first node of said multi-node shared-nothing database system initially functioning as exclusive owner of a first data item and a second data item , wherein said first data item and said second data item are persistently stored data (computer system) items within a database managed by the multi-node shared-nothing database system ;
without changing the location of a first data item on persistent storage or shutting down said first node , reassigning ownership of the first data item from the first node to a second node of said multi-node shared-nothing database system ;
and after reassigning ownership , the first node continuing to operate as the owner of the second data item , and to handle all requests for operations on said second data item .

US8190610B2
CLAIM 27
. The computer system (stored data) of claim 26 , wherein : the reducing includes processing the metadata .
US20040117345A1
CLAIM 67
. A method for use in a multi-node shared-nothing database system , the method comprising the steps of : a first node of said multi-node shared-nothing database system initially functioning as exclusive owner of a first data item and a second data item , wherein said first data item and said second data item are persistently stored data (computer system) items within a database managed by the multi-node shared-nothing database system ;
without changing the location of a first data item on persistent storage or shutting down said first node , reassigning ownership of the first data item from the first node to a second node of said multi-node shared-nothing database system ;
and after reassigning ownership , the first node continuing to operate as the owner of the second data item , and to handle all requests for operations on said second data item .

US8190610B2
CLAIM 28
. The computer system (stored data) of claim 27 , wherein : the at least one processor and memory being further operable to process the intermediate data for each data group in a manner that is defined to correspond to that data group includes , for each data group , the at least one processor and memory being further operable to employ an iterator that corresponds to that data group , wherein the iterator is configured to provide the associated metadata to the reducing .
US20040117345A1
CLAIM 67
. A method for use in a multi-node shared-nothing database system , the method comprising the steps of : a first node of said multi-node shared-nothing database system initially functioning as exclusive owner of a first data item and a second data item , wherein said first data item and said second data item are persistently stored data (computer system) items within a database managed by the multi-node shared-nothing database system ;
without changing the location of a first data item on persistent storage or shutting down said first node , reassigning ownership of the first data item from the first node to a second node of said multi-node shared-nothing database system ;
and after reassigning ownership , the first node continuing to operate as the owner of the second data item , and to handle all requests for operations on said second data item .

US8190610B2
CLAIM 29
. The computer system (stored data) of claim 17 , wherein : the intermediate data processing of the reducing further comprises processing data that is not intermediate data .
US20040117345A1
CLAIM 67
. A method for use in a multi-node shared-nothing database system , the method comprising the steps of : a first node of said multi-node shared-nothing database system initially functioning as exclusive owner of a first data item and a second data item , wherein said first data item and said second data item are persistently stored data (computer system) items within a database managed by the multi-node shared-nothing database system ;
without changing the location of a first data item on persistent storage or shutting down said first node , reassigning ownership of the first data item from the first node to a second node of said multi-node shared-nothing database system ;
and after reassigning ownership , the first node continuing to operate as the owner of the second data item , and to handle all requests for operations on said second data item .

US8190610B2
CLAIM 30
. The computer system (stored data) of claim 29 , wherein : the at least one processor and memory are further operable to reduce the intermediate data via a plurality of reducers ;

and the data that is not intermediate data , for a particular reducer , includes data that is associated with another reducer .
US20040117345A1
CLAIM 67
. A method for use in a multi-node shared-nothing database system , the method comprising the steps of : a first node of said multi-node shared-nothing database system initially functioning as exclusive owner of a first data item and a second data item , wherein said first data item and said second data item are persistently stored data (computer system) items within a database managed by the multi-node shared-nothing database system ;
without changing the location of a first data item on persistent storage or shutting down said first node , reassigning ownership of the first data item from the first node to a second node of said multi-node shared-nothing database system ;
and after reassigning ownership , the first node continuing to operate as the owner of the second data item , and to handle all requests for operations on said second data item .

US8190610B2
CLAIM 31
. The computer system (stored data) of claim 29 , wherein : the at least one processor and memory are further operable to reduce the intermediate data via a plurality of reducers ;

and the data that is not intermediate data , for a particular reducer , includes data that is associated with that reducer .
US20040117345A1
CLAIM 67
. A method for use in a multi-node shared-nothing database system , the method comprising the steps of : a first node of said multi-node shared-nothing database system initially functioning as exclusive owner of a first data item and a second data item , wherein said first data item and said second data item are persistently stored data (computer system) items within a database managed by the multi-node shared-nothing database system ;
without changing the location of a first data item on persistent storage or shutting down said first node , reassigning ownership of the first data item from the first node to a second node of said multi-node shared-nothing database system ;
and after reassigning ownership , the first node continuing to operate as the owner of the second data item , and to handle all requests for operations on said second data item .

US8190610B2
CLAIM 32
. The computer system (stored data) of claim 17 , wherein the at least one processor and memory are further operable to reduce by relating the data among the plurality of data groups .
US20040117345A1
CLAIM 67
. A method for use in a multi-node shared-nothing database system , the method comprising the steps of : a first node of said multi-node shared-nothing database system initially functioning as exclusive owner of a first data item and a second data item , wherein said first data item and said second data item are persistently stored data (computer system) items within a database managed by the multi-node shared-nothing database system ;
without changing the location of a first data item on persistent storage or shutting down said first node , reassigning ownership of the first data item from the first node to a second node of said multi-node shared-nothing database system ;
and after reassigning ownership , the first node continuing to operate as the owner of the second data item , and to handle all requests for operations on said second data item .

US8190610B2
CLAIM 33
. A map-reduce method of processing data from a plurality of groups having different schema (persistent data) over a computer system (stored data) , the method comprising : for a first data (first data, time t) set having a plurality of first key-value pairs , wherein such first data set belongs to a first data group and the first key-value pairs have a first schema , partitioning the first data set into a plurality of data partitions (based partitioning) and providing each data partition to a selected one of a plurality of mapping functions that are each user-configurable to independently output a plurality of lists of values for each of a set of keys found in such map function' ;

s corresponding data partition to form a first intermediate data set having a first set (first set) of resulting key-value pairs ;

for a second data set having a plurality of second key-value pairs , wherein such second data set belongs to a second data group and the second key-value pairs have a second schema , partitioning the first data set into a plurality of data partitions and providing each data partition to a selected one of a plurality of mapping functions to form a second intermediate data set having a second set (second set) of resulting key-value pairs differing from the first set of resulting key-value pairs , wherein the first schema differs from the second schema , wherein the first and second schema and the first and second set of resulting key-value pairs have a key in common ;

and reducing the first and the second intermediate data set together so as to form an output data set , wherein the reducing is accomplished by iterating on at least one key from the first and second intermediate data set , and the output data set has a different schema than the first and second schema , wherein the mapping and reducing operations are performed by a distributed system .
US20040117345A1
CLAIM 1
. A method for managing data , the method comprising the steps of : maintaining a plurality of persistent data (different schema) items on persistent storage accessible to a plurality of nodes , the persistent data items including a particular data item stored at a particular location on said persistent storage ;
assigning exclusive ownership of each of the persistent data items to one of the nodes , wherein a particular node of said plurality of nodes is assigned exclusive ownership of said particular data item ;
when any node wants an operation performed that involves said particular data item , the node that desires the operation to be performed ships the operation to the particular node for the particular node to perform the operation on the particular data item as said particular data item resides at said particular location ;
while the first node continues to operate , reassigning ownership of the particular data item from the particular node to another node without moving the particular data item from said particular location on said persistent storage ;
after the reassignment , when any node wants an operation performed that involves said particular data item , the node that desires the operation to be performed ships the operation to said other node for the other node to perform the operation on the particular data item as said particular data item resides at said particular location .

US20040117345A1
CLAIM 19
. The method of claim 1 wherein : the step of reassigning ownership of the particular data item from the particular node to another node is performed without waiting for a transaction that is modifying the data item to commit ;
the transaction makes a first set (first set) of modifications while the particular data item is owned by the particular node ;
and the transaction makes a second set (second set) of modifications while the particular data item is owned by said other node .

US20040117345A1
CLAIM 23
. The method of claim 1 wherein : an operation that involves said particular data item is in-progress at the time t (first data, first data group) he transfer of ownership of said particular data item is to be performed ;
the method further includes the step of determining whether to wait for said in-progress operation to complete based on a set of one or more factors ;
and if it is determined to not wait for said in-progress operation to complete , aborting said in-progress operation .

US20040117345A1
CLAIM 28
. The method of claim 25 wherein the step of assigning each data item to one of a plurality of buckets is performed using range-based partitioning (data partitions, partitioning step, combine task) .

US20040117345A1
CLAIM 67
. A method for use in a multi-node shared-nothing database system , the method comprising the steps of : a first node of said multi-node shared-nothing database system initially functioning as exclusive owner of a first data (first data, first data group) item and a second data item , wherein said first data item and said second data item are persistently stored data (computer system) items within a database managed by the multi-node shared-nothing database system ;
without changing the location of a first data item on persistent storage or shutting down said first node , reassigning ownership of the first data item from the first node to a second node of said multi-node shared-nothing database system ;
and after reassigning ownership , the first node continuing to operate as the owner of the second data item , and to handle all requests for operations on said second data item .

US8190610B2
CLAIM 37
. The map-reduce method of claim 35 , wherein at least some of the reducers include a sort , group-by-key and combine task (based partitioning) , the method further comprises generating and providing metadata for at least some of the mapping , partitioning , combining , grouping and sorting .
US20040117345A1
CLAIM 28
. The method of claim 25 wherein the step of assigning each data item to one of a plurality of buckets is performed using range-based partitioning (data partitions, partitioning step, combine task) .

US8190610B2
CLAIM 40
. A computer system (stored data) including a plurality of computing devices , the computer system configured to process data of a data set , wherein the data set comprises a plurality of data groups , the computer system comprises at least one processor and memory that are operable to perform the following operations : for a first data (first data, time t) set having a plurality of first key-value pairs , wherein such first data set belongs to a first data group and the first key-value pairs have a first schema , partitioning the first data set into a plurality of data partitions (based partitioning) and providing each data partition to a selected one of a plurality of mapping functions that are each user-configurable to independently output a plurality of lists of values for each of a set of keys found in such map function' ;

s corresponding data partition to form a first intermediate data set having a first set (first set) of resulting key-value pairs ;

for a second data set having a plurality of second key-value pairs , wherein such second data set belongs to a second data group and the second key-value pairs have a second schema , partitioning the first data set into a plurality of data partitions and providing each data partition to a selected one of a plurality of mapping functions to form a second intermediate data set having a second set (second set) of resulting key-value pairs differing from the first set of resulting key-value pairs , wherein the first schema differs from the second schema ;

and reducing the first and the second intermediate data set together so as to form an output data set , wherein the reducing is accomplished by iterating on at least one key from the first and second intermediate data set , and the output data set has a different schema (persistent data) than the first and second schema , wherein the first and second schema and the first and second set of resulting key-value pairs have a key in common .
US20040117345A1
CLAIM 1
. A method for managing data , the method comprising the steps of : maintaining a plurality of persistent data (different schema) items on persistent storage accessible to a plurality of nodes , the persistent data items including a particular data item stored at a particular location on said persistent storage ;
assigning exclusive ownership of each of the persistent data items to one of the nodes , wherein a particular node of said plurality of nodes is assigned exclusive ownership of said particular data item ;
when any node wants an operation performed that involves said particular data item , the node that desires the operation to be performed ships the operation to the particular node for the particular node to perform the operation on the particular data item as said particular data item resides at said particular location ;
while the first node continues to operate , reassigning ownership of the particular data item from the particular node to another node without moving the particular data item from said particular location on said persistent storage ;
after the reassignment , when any node wants an operation performed that involves said particular data item , the node that desires the operation to be performed ships the operation to said other node for the other node to perform the operation on the particular data item as said particular data item resides at said particular location .

US20040117345A1
CLAIM 19
. The method of claim 1 wherein : the step of reassigning ownership of the particular data item from the particular node to another node is performed without waiting for a transaction that is modifying the data item to commit ;
the transaction makes a first set (first set) of modifications while the particular data item is owned by the particular node ;
and the transaction makes a second set (second set) of modifications while the particular data item is owned by said other node .

US20040117345A1
CLAIM 23
. The method of claim 1 wherein : an operation that involves said particular data item is in-progress at the time t (first data, first data group) he transfer of ownership of said particular data item is to be performed ;
the method further includes the step of determining whether to wait for said in-progress operation to complete based on a set of one or more factors ;
and if it is determined to not wait for said in-progress operation to complete , aborting said in-progress operation .

US20040117345A1
CLAIM 28
. The method of claim 25 wherein the step of assigning each data item to one of a plurality of buckets is performed using range-based partitioning (data partitions, partitioning step, combine task) .

US20040117345A1
CLAIM 67
. A method for use in a multi-node shared-nothing database system , the method comprising the steps of : a first node of said multi-node shared-nothing database system initially functioning as exclusive owner of a first data (first data, first data group) item and a second data item , wherein said first data item and said second data item are persistently stored data (computer system) items within a database managed by the multi-node shared-nothing database system ;
without changing the location of a first data item on persistent storage or shutting down said first node , reassigning ownership of the first data item from the first node to a second node of said multi-node shared-nothing database system ;
and after reassigning ownership , the first node continuing to operate as the owner of the second data item , and to handle all requests for operations on said second data item .

US8190610B2
CLAIM 41
. The computer system (stored data) of claim 40 , wherein the reducing is accomplished by iterating on at least one key from each of the first and second intermediate data set so that the output data set is a merging of a portion of the first and second intermediate data set .
US20040117345A1
CLAIM 67
. A method for use in a multi-node shared-nothing database system , the method comprising the steps of : a first node of said multi-node shared-nothing database system initially functioning as exclusive owner of a first data item and a second data item , wherein said first data item and said second data item are persistently stored data (computer system) items within a database managed by the multi-node shared-nothing database system ;
without changing the location of a first data item on persistent storage or shutting down said first node , reassigning ownership of the first data item from the first node to a second node of said multi-node shared-nothing database system ;
and after reassigning ownership , the first node continuing to operate as the owner of the second data item , and to handle all requests for operations on said second data item .

US8190610B2
CLAIM 42
. The computer system (stored data) of claim 40 , wherein the reducing is accomplished by a plurality of reducers , the d the at least one processor and memory further operable for : partitioning the first or second intermediate data set into a plurality of partitions , at least some of the key-value pairs of the first or second intermediate data being provided to more than one of the partitions ;

and providing the partitioned intermediate data set of each partition to a separate one of the reducers .
US20040117345A1
CLAIM 67
. A method for use in a multi-node shared-nothing database system , the method comprising the steps of : a first node of said multi-node shared-nothing database system initially functioning as exclusive owner of a first data item and a second data item , wherein said first data item and said second data item are persistently stored data (computer system) items within a database managed by the multi-node shared-nothing database system ;
without changing the location of a first data item on persistent storage or shutting down said first node , reassigning ownership of the first data item from the first node to a second node of said multi-node shared-nothing database system ;
and after reassigning ownership , the first node continuing to operate as the owner of the second data item , and to handle all requests for operations on said second data item .

US8190610B2
CLAIM 43
. The computer system (stored data) of claim 42 , wherein all of the key-value pairs of the first and second intermediate data set are provided to all of the reducers .
US20040117345A1
CLAIM 67
. A method for use in a multi-node shared-nothing database system , the method comprising the steps of : a first node of said multi-node shared-nothing database system initially functioning as exclusive owner of a first data item and a second data item , wherein said first data item and said second data item are persistently stored data (computer system) items within a database managed by the multi-node shared-nothing database system ;
without changing the location of a first data item on persistent storage or shutting down said first node , reassigning ownership of the first data item from the first node to a second node of said multi-node shared-nothing database system ;
and after reassigning ownership , the first node continuing to operate as the owner of the second data item , and to handle all requests for operations on said second data item .

US8190610B2
CLAIM 44
. The computer system (stored data) of claim 42 , wherein at least some of the reducers include a sort , group-by-key and combine task (based partitioning) , the d the at least one processor and memory are further operable for : generating and providing metadata for at least some of the mapping , partitioning , combining , grouping and sorting .
US20040117345A1
CLAIM 28
. The method of claim 25 wherein the step of assigning each data item to one of a plurality of buckets is performed using range-based partitioning (data partitions, partitioning step, combine task) .

US20040117345A1
CLAIM 67
. A method for use in a multi-node shared-nothing database system , the method comprising the steps of : a first node of said multi-node shared-nothing database system initially functioning as exclusive owner of a first data item and a second data item , wherein said first data item and said second data item are persistently stored data (computer system) items within a database managed by the multi-node shared-nothing database system ;
without changing the location of a first data item on persistent storage or shutting down said first node , reassigning ownership of the first data item from the first node to a second node of said multi-node shared-nothing database system ;
and after reassigning ownership , the first node continuing to operate as the owner of the second data item , and to handle all requests for operations on said second data item .

US8190610B2
CLAIM 45
. The computer system (stored data) of claim 44 , wherein the reducing includes processing the metadata .
US20040117345A1
CLAIM 67
. A method for use in a multi-node shared-nothing database system , the method comprising the steps of : a first node of said multi-node shared-nothing database system initially functioning as exclusive owner of a first data item and a second data item , wherein said first data item and said second data item are persistently stored data (computer system) items within a database managed by the multi-node shared-nothing database system ;
without changing the location of a first data item on persistent storage or shutting down said first node , reassigning ownership of the first data item from the first node to a second node of said multi-node shared-nothing database system ;
and after reassigning ownership , the first node continuing to operate as the owner of the second data item , and to handle all requests for operations on said second data item .

US8190610B2
CLAIM 46
. The computer system (stored data) of claim 45 , wherein iterating includes providing the associated metadata to the processing of the reducing step .
US20040117345A1
CLAIM 67
. A method for use in a multi-node shared-nothing database system , the method comprising the steps of : a first node of said multi-node shared-nothing database system initially functioning as exclusive owner of a first data item and a second data item , wherein said first data item and said second data item are persistently stored data (computer system) items within a database managed by the multi-node shared-nothing database system ;
without changing the location of a first data item on persistent storage or shutting down said first node , reassigning ownership of the first data item from the first node to a second node of said multi-node shared-nothing database system ;
and after reassigning ownership , the first node continuing to operate as the owner of the second data item , and to handle all requests for operations on said second data item .




US8190610B2

Filed: 2006-10-05     Issued: 2012-05-29

MapReduce for distributed database processing

(Original Assignee) Yahoo Inc     (Current Assignee) R2 Solutions LLC ; Altaba Inc

Ali Dasdan, Hung-Chih Yang, Ruey-Lung Hsiao
WO2004028108A2

Filed: 2003-09-08     Issued: 2004-04-01

Method for archiving multimedia messages

(Original Assignee) Eastman Kodak Company     

Jean-Marie Vau, Joachim Moelle, Olivier Alain Christian Furon, Olivier Marc Rigault
US8190610B2
CLAIM 1
. A method of processing data of a data set over a distributed system , wherein the data set comprises a plurality of data groups , the method comprising : partitioning the data of each one of the data groups into a plurality of data partitions that each have a plurality of key-value pairs and providing each data partition to a selected one of a plurality of mapping functions that are each user-configurable to independently output a plurality of lists of values for each of a set of keys found in such map function' ;

s corresponding data partition to form corresponding intermediate data for that data group and identifiable to that data group , wherein the data of a first data (first data) group has a different schema than the data of a second data (second data) group and the data of the first data group is mapped differently than the data of the second data group so that different lists of values are output for the corresponding different intermediate data , wherein the different schema and corresponding different intermediate data have a key in common ;

and reducing the intermediate data for the data groups to at least one output data group , including processing the intermediate data for each data group in a manner that is defined to correspond to that data group , so as to result in a merging of the corresponding different intermediate data based on the key in common , wherein the mapping and reducing operations are performed by a distributed system .
WO2004028108A2
CLAIM 1
. A method enabling the communication of at least one multimedia message between at least two terminals (3) , (4) located in a digital network comprising a first data (first data) server (1) and a second data (second data) server (2) , each data server (1) , (2) comprising at least one user data base (16) , (22) and a digital data storage means (15) , (21) , said method being characterized in that it enables multimedia messages to be synchronized and archived between the two data servers by automatically performing the following steps : a) from at least one multimedia message sent from a first terminal (3) and intended to be sent to a receiving address of a second terminal (4) , the contents of said multimedia message being temporarily saved in the first server (1) , determine a subscription identifier to a recipient' ;
s archiving service , the archiving service being specific to the second server (2) ;
b) associate the recipient' ;
s address with the subscription identifier to the archiving service of said recipient ;
c) send the contents of the multimedia message from the first server (1) to the second server (2) ;
d) archive the contents of the multimedia message in the second server (2) for an undetermined period .

US8190610B2
CLAIM 6
. The method of claim 1 , wherein : the intermediate data includes a plurality of grouped sets of key/value pairs ;

the reducing step (first terminal) is carried out by a plurality of reducers ;

the method further comprises : partitioning the intermediate data into a plurality of partitions , each key/value pair of the intermediate data being provided to a separate one of the partitions ;

and providing the intermediate data of each partition to a separate one of the reducers .
WO2004028108A2
CLAIM 1
. A method enabling the communication of at least one multimedia message between at least two terminals (3) , (4) located in a digital network comprising a first data server (1) and a second data server (2) , each data server (1) , (2) comprising at least one user data base (16) , (22) and a digital data storage means (15) , (21) , said method being characterized in that it enables multimedia messages to be synchronized and archived between the two data servers by automatically performing the following steps : a) from at least one multimedia message sent from a first terminal (reducing step) (3) and intended to be sent to a receiving address of a second terminal (4) , the contents of said multimedia message being temporarily saved in the first server (1) , determine a subscription identifier to a recipient' ;
s archiving service , the archiving service being specific to the second server (2) ;
b) associate the recipient' ;
s address with the subscription identifier to the archiving service of said recipient ;
c) send the contents of the multimedia message from the first server (1) to the second server (2) ;
d) archive the contents of the multimedia message in the second server (2) for an undetermined period .

US8190610B2
CLAIM 7
. The method of claim 1 , wherein : the intermediate data includes a plurality of grouped sets of key/value pairs ;

the reducing step (first terminal) is carried out by a plurality of reducers ;

the method further comprises : partitioning the intermediate data into a plurality of partitions , at least some of the key/value pairs of the intermediate data being provided to more than one of the partitions ;

and providing the intermediate data of each partition to a separate one of the reducers .
WO2004028108A2
CLAIM 1
. A method enabling the communication of at least one multimedia message between at least two terminals (3) , (4) located in a digital network comprising a first data server (1) and a second data server (2) , each data server (1) , (2) comprising at least one user data base (16) , (22) and a digital data storage means (15) , (21) , said method being characterized in that it enables multimedia messages to be synchronized and archived between the two data servers by automatically performing the following steps : a) from at least one multimedia message sent from a first terminal (reducing step) (3) and intended to be sent to a receiving address of a second terminal (4) , the contents of said multimedia message being temporarily saved in the first server (1) , determine a subscription identifier to a recipient' ;
s archiving service , the archiving service being specific to the second server (2) ;
b) associate the recipient' ;
s address with the subscription identifier to the archiving service of said recipient ;
c) send the contents of the multimedia message from the first server (1) to the second server (2) ;
d) archive the contents of the multimedia message in the second server (2) for an undetermined period .

US8190610B2
CLAIM 10
. The method of claim 9 , wherein : the reducing step (first terminal) includes processing the metadata .
WO2004028108A2
CLAIM 1
. A method enabling the communication of at least one multimedia message between at least two terminals (3) , (4) located in a digital network comprising a first data server (1) and a second data server (2) , each data server (1) , (2) comprising at least one user data base (16) , (22) and a digital data storage means (15) , (21) , said method being characterized in that it enables multimedia messages to be synchronized and archived between the two data servers by automatically performing the following steps : a) from at least one multimedia message sent from a first terminal (reducing step) (3) and intended to be sent to a receiving address of a second terminal (4) , the contents of said multimedia message being temporarily saved in the first server (1) , determine a subscription identifier to a recipient' ;
s archiving service , the archiving service being specific to the second server (2) ;
b) associate the recipient' ;
s address with the subscription identifier to the archiving service of said recipient ;
c) send the contents of the multimedia message from the first server (1) to the second server (2) ;
d) archive the contents of the multimedia message in the second server (2) for an undetermined period .

US8190610B2
CLAIM 11
. The method of claim 10 , wherein : processing the intermediate data for each data group in a manner that is defined to correspond to that data group includes , for each data group , employing an iterator that corresponds to that data group , wherein the iterator includes providing the associated metadata to the processing of the reducing step (first terminal) .
WO2004028108A2
CLAIM 1
. A method enabling the communication of at least one multimedia message between at least two terminals (3) , (4) located in a digital network comprising a first data server (1) and a second data server (2) , each data server (1) , (2) comprising at least one user data base (16) , (22) and a digital data storage means (15) , (21) , said method being characterized in that it enables multimedia messages to be synchronized and archived between the two data servers by automatically performing the following steps : a) from at least one multimedia message sent from a first terminal (reducing step) (3) and intended to be sent to a receiving address of a second terminal (4) , the contents of said multimedia message being temporarily saved in the first server (1) , determine a subscription identifier to a recipient' ;
s archiving service , the archiving service being specific to the second server (2) ;
b) associate the recipient' ;
s address with the subscription identifier to the archiving service of said recipient ;
c) send the contents of the multimedia message from the first server (1) to the second server (2) ;
d) archive the contents of the multimedia message in the second server (2) for an undetermined period .

US8190610B2
CLAIM 12
. The method of claim 5 , wherein : the intermediate data includes a plurality of grouped sets of key/value pairs ;

the reducing step (first terminal) is carried out by a plurality of reducers ;

for at least one of the reducers , the iterator corresponding to a particular data group , for that reducer , operates according to a different key of a different schema than the iterator corresponding to another particular data group , for that reducer .
WO2004028108A2
CLAIM 1
. A method enabling the communication of at least one multimedia message between at least two terminals (3) , (4) located in a digital network comprising a first data server (1) and a second data server (2) , each data server (1) , (2) comprising at least one user data base (16) , (22) and a digital data storage means (15) , (21) , said method being characterized in that it enables multimedia messages to be synchronized and archived between the two data servers by automatically performing the following steps : a) from at least one multimedia message sent from a first terminal (reducing step) (3) and intended to be sent to a receiving address of a second terminal (4) , the contents of said multimedia message being temporarily saved in the first server (1) , determine a subscription identifier to a recipient' ;
s archiving service , the archiving service being specific to the second server (2) ;
b) associate the recipient' ;
s address with the subscription identifier to the archiving service of said recipient ;
c) send the contents of the multimedia message from the first server (1) to the second server (2) ;
d) archive the contents of the multimedia message in the second server (2) for an undetermined period .

US8190610B2
CLAIM 13
. The method of claim 1 , wherein : the intermediate data processing step of the reducing step (first terminal) further comprises processing data that is not intermediate data .
WO2004028108A2
CLAIM 1
. A method enabling the communication of at least one multimedia message between at least two terminals (3) , (4) located in a digital network comprising a first data server (1) and a second data server (2) , each data server (1) , (2) comprising at least one user data base (16) , (22) and a digital data storage means (15) , (21) , said method being characterized in that it enables multimedia messages to be synchronized and archived between the two data servers by automatically performing the following steps : a) from at least one multimedia message sent from a first terminal (reducing step) (3) and intended to be sent to a receiving address of a second terminal (4) , the contents of said multimedia message being temporarily saved in the first server (1) , determine a subscription identifier to a recipient' ;
s archiving service , the archiving service being specific to the second server (2) ;
b) associate the recipient' ;
s address with the subscription identifier to the archiving service of said recipient ;
c) send the contents of the multimedia message from the first server (1) to the second server (2) ;
d) archive the contents of the multimedia message in the second server (2) for an undetermined period .

US8190610B2
CLAIM 14
. The method of claim 13 , wherein : the reducing step (first terminal) is carried out by a plurality of reducers ;

and the data that is not intermediate data , for a particular reducer , includes data that is associated with another reducer .
WO2004028108A2
CLAIM 1
. A method enabling the communication of at least one multimedia message between at least two terminals (3) , (4) located in a digital network comprising a first data server (1) and a second data server (2) , each data server (1) , (2) comprising at least one user data base (16) , (22) and a digital data storage means (15) , (21) , said method being characterized in that it enables multimedia messages to be synchronized and archived between the two data servers by automatically performing the following steps : a) from at least one multimedia message sent from a first terminal (reducing step) (3) and intended to be sent to a receiving address of a second terminal (4) , the contents of said multimedia message being temporarily saved in the first server (1) , determine a subscription identifier to a recipient' ;
s archiving service , the archiving service being specific to the second server (2) ;
b) associate the recipient' ;
s address with the subscription identifier to the archiving service of said recipient ;
c) send the contents of the multimedia message from the first server (1) to the second server (2) ;
d) archive the contents of the multimedia message in the second server (2) for an undetermined period .

US8190610B2
CLAIM 15
. The method of claim 13 , wherein : the reducing step (first terminal) is carried out by a plurality of reducers ;

and the data that is not intermediate data , for a particular reducer , includes data that is associated with that reducer .
WO2004028108A2
CLAIM 1
. A method enabling the communication of at least one multimedia message between at least two terminals (3) , (4) located in a digital network comprising a first data server (1) and a second data server (2) , each data server (1) , (2) comprising at least one user data base (16) , (22) and a digital data storage means (15) , (21) , said method being characterized in that it enables multimedia messages to be synchronized and archived between the two data servers by automatically performing the following steps : a) from at least one multimedia message sent from a first terminal (reducing step) (3) and intended to be sent to a receiving address of a second terminal (4) , the contents of said multimedia message being temporarily saved in the first server (1) , determine a subscription identifier to a recipient' ;
s archiving service , the archiving service being specific to the second server (2) ;
b) associate the recipient' ;
s address with the subscription identifier to the archiving service of said recipient ;
c) send the contents of the multimedia message from the first server (1) to the second server (2) ;
d) archive the contents of the multimedia message in the second server (2) for an undetermined period .

US8190610B2
CLAIM 16
. The method of claim 1 wherein the reducing step (first terminal) includes relating the data among the plurality of data groups .
WO2004028108A2
CLAIM 1
. A method enabling the communication of at least one multimedia message between at least two terminals (3) , (4) located in a digital network comprising a first data server (1) and a second data server (2) , each data server (1) , (2) comprising at least one user data base (16) , (22) and a digital data storage means (15) , (21) , said method being characterized in that it enables multimedia messages to be synchronized and archived between the two data servers by automatically performing the following steps : a) from at least one multimedia message sent from a first terminal (reducing step) (3) and intended to be sent to a receiving address of a second terminal (4) , the contents of said multimedia message being temporarily saved in the first server (1) , determine a subscription identifier to a recipient' ;
s archiving service , the archiving service being specific to the second server (2) ;
b) associate the recipient' ;
s address with the subscription identifier to the archiving service of said recipient ;
c) send the contents of the multimedia message from the first server (1) to the second server (2) ;
d) archive the contents of the multimedia message in the second server (2) for an undetermined period .

US8190610B2
CLAIM 17
. A computer system including a plurality of computing devices , the computer system configured to process data of a data set , wherein the data set comprises a plurality of data groups , the computer system comprises at least one processor and memory that are operable to perform the following operations : partitioning the data of each one of the data groups into a plurality of data partitions that each have a plurality of key-value pairs and providing each data partition to a selected one of a plurality of mapping functions that are each user-configurable to independently output a plurality of lists of values for each of a set of keys found in such map function' ;

s corresponding data partition to form corresponding intermediate data for that data group and identifiable to that data group , wherein the data of a first data (first data) group has a different schema than the data of a second data (second data) group and the data of the first data group is mapped differently than the data of the second data group so that different lists of values are output for the corresponding different intermediate data , wherein the different schema and corresponding different intermediate data have a key in common ;

and reduce the intermediate data for the data groups to at least one output data group , including processing the intermediate data for each data group in a manner that is defined to correspond to that data group so as to result in a merging of the corresponding different intermediate data based on the key in common .
WO2004028108A2
CLAIM 1
. A method enabling the communication of at least one multimedia message between at least two terminals (3) , (4) located in a digital network comprising a first data (first data) server (1) and a second data (second data) server (2) , each data server (1) , (2) comprising at least one user data base (16) , (22) and a digital data storage means (15) , (21) , said method being characterized in that it enables multimedia messages to be synchronized and archived between the two data servers by automatically performing the following steps : a) from at least one multimedia message sent from a first terminal (3) and intended to be sent to a receiving address of a second terminal (4) , the contents of said multimedia message being temporarily saved in the first server (1) , determine a subscription identifier to a recipient' ;
s archiving service , the archiving service being specific to the second server (2) ;
b) associate the recipient' ;
s address with the subscription identifier to the archiving service of said recipient ;
c) send the contents of the multimedia message from the first server (1) to the second server (2) ;
d) archive the contents of the multimedia message in the second server (2) for an undetermined period .

US8190610B2
CLAIM 33
. A map-reduce method of processing data from a plurality of groups having different schema over a computer system , the method comprising : for a first data (first data) set having a plurality of first key-value pairs , wherein such first data set belongs to a first data group and the first key-value pairs have a first schema , partitioning the first data set into a plurality of data partitions and providing each data partition to a selected one of a plurality of mapping functions that are each user-configurable to independently output a plurality of lists of values for each of a set of keys found in such map function' ;

s corresponding data partition to form a first intermediate data set having a first set of resulting key-value pairs ;

for a second data (second data) set having a plurality of second key-value pairs , wherein such second data set belongs to a second data group and the second key-value pairs have a second schema , partitioning the first data set into a plurality of data partitions and providing each data partition to a selected one of a plurality of mapping functions to form a second intermediate data set having a second set (following steps) of resulting key-value pairs differing from the first set of resulting key-value pairs , wherein the first schema differs from the second schema , wherein the first and second schema and the first and second set of resulting key-value pairs have a key in common ;

and reducing the first and the second intermediate data set together so as to form an output data set , wherein the reducing is accomplished by iterating on at least one key from the first and second intermediate data set , and the output data set has a different schema than the first and second schema , wherein the mapping and reducing operations are performed by a distributed system .
WO2004028108A2
CLAIM 1
. A method enabling the communication of at least one multimedia message between at least two terminals (3) , (4) located in a digital network comprising a first data (first data) server (1) and a second data (second data) server (2) , each data server (1) , (2) comprising at least one user data base (16) , (22) and a digital data storage means (15) , (21) , said method being characterized in that it enables multimedia messages to be synchronized and archived between the two data servers by automatically performing the following steps (second set, reduce method) : a) from at least one multimedia message sent from a first terminal (3) and intended to be sent to a receiving address of a second terminal (4) , the contents of said multimedia message being temporarily saved in the first server (1) , determine a subscription identifier to a recipient' ;
s archiving service , the archiving service being specific to the second server (2) ;
b) associate the recipient' ;
s address with the subscription identifier to the archiving service of said recipient ;
c) send the contents of the multimedia message from the first server (1) to the second server (2) ;
d) archive the contents of the multimedia message in the second server (2) for an undetermined period .

US8190610B2
CLAIM 39
. The map-reduce method of claim 38 , wherein iterating includes providing the associated metadata to the processing of the reducing step (first terminal) .
WO2004028108A2
CLAIM 1
. A method enabling the communication of at least one multimedia message between at least two terminals (3) , (4) located in a digital network comprising a first data server (1) and a second data server (2) , each data server (1) , (2) comprising at least one user data base (16) , (22) and a digital data storage means (15) , (21) , said method being characterized in that it enables multimedia messages to be synchronized and archived between the two data servers by automatically performing the following steps : a) from at least one multimedia message sent from a first terminal (reducing step) (3) and intended to be sent to a receiving address of a second terminal (4) , the contents of said multimedia message being temporarily saved in the first server (1) , determine a subscription identifier to a recipient' ;
s archiving service , the archiving service being specific to the second server (2) ;
b) associate the recipient' ;
s address with the subscription identifier to the archiving service of said recipient ;
c) send the contents of the multimedia message from the first server (1) to the second server (2) ;
d) archive the contents of the multimedia message in the second server (2) for an undetermined period .

US8190610B2
CLAIM 40
. A computer system including a plurality of computing devices , the computer system configured to process data of a data set , wherein the data set comprises a plurality of data groups , the computer system comprises at least one processor and memory that are operable to perform the following operations : for a first data (first data) set having a plurality of first key-value pairs , wherein such first data set belongs to a first data group and the first key-value pairs have a first schema , partitioning the first data set into a plurality of data partitions and providing each data partition to a selected one of a plurality of mapping functions that are each user-configurable to independently output a plurality of lists of values for each of a set of keys found in such map function' ;

s corresponding data partition to form a first intermediate data set having a first set of resulting key-value pairs ;

for a second data (second data) set having a plurality of second key-value pairs , wherein such second data set belongs to a second data group and the second key-value pairs have a second schema , partitioning the first data set into a plurality of data partitions and providing each data partition to a selected one of a plurality of mapping functions to form a second intermediate data set having a second set (following steps) of resulting key-value pairs differing from the first set of resulting key-value pairs , wherein the first schema differs from the second schema ;

and reducing the first and the second intermediate data set together so as to form an output data set , wherein the reducing is accomplished by iterating on at least one key from the first and second intermediate data set , and the output data set has a different schema than the first and second schema , wherein the first and second schema and the first and second set of resulting key-value pairs have a key in common .
WO2004028108A2
CLAIM 1
. A method enabling the communication of at least one multimedia message between at least two terminals (3) , (4) located in a digital network comprising a first data (first data) server (1) and a second data (second data) server (2) , each data server (1) , (2) comprising at least one user data base (16) , (22) and a digital data storage means (15) , (21) , said method being characterized in that it enables multimedia messages to be synchronized and archived between the two data servers by automatically performing the following steps (second set, reduce method) : a) from at least one multimedia message sent from a first terminal (3) and intended to be sent to a receiving address of a second terminal (4) , the contents of said multimedia message being temporarily saved in the first server (1) , determine a subscription identifier to a recipient' ;
s archiving service , the archiving service being specific to the second server (2) ;
b) associate the recipient' ;
s address with the subscription identifier to the archiving service of said recipient ;
c) send the contents of the multimedia message from the first server (1) to the second server (2) ;
d) archive the contents of the multimedia message in the second server (2) for an undetermined period .

US8190610B2
CLAIM 46
. The computer system of claim 45 , wherein iterating includes providing the associated metadata to the processing of the reducing step (first terminal) .
WO2004028108A2
CLAIM 1
. A method enabling the communication of at least one multimedia message between at least two terminals (3) , (4) located in a digital network comprising a first data server (1) and a second data server (2) , each data server (1) , (2) comprising at least one user data base (16) , (22) and a digital data storage means (15) , (21) , said method being characterized in that it enables multimedia messages to be synchronized and archived between the two data servers by automatically performing the following steps : a) from at least one multimedia message sent from a first terminal (reducing step) (3) and intended to be sent to a receiving address of a second terminal (4) , the contents of said multimedia message being temporarily saved in the first server (1) , determine a subscription identifier to a recipient' ;
s archiving service , the archiving service being specific to the second server (2) ;
b) associate the recipient' ;
s address with the subscription identifier to the archiving service of said recipient ;
c) send the contents of the multimedia message from the first server (1) to the second server (2) ;
d) archive the contents of the multimedia message in the second server (2) for an undetermined period .




US8190610B2

Filed: 2006-10-05     Issued: 2012-05-29

MapReduce for distributed database processing

(Original Assignee) Yahoo Inc     (Current Assignee) R2 Solutions LLC ; Altaba Inc

Ali Dasdan, Hung-Chih Yang, Ruey-Lung Hsiao
EP1511232A1

Filed: 2003-08-28     Issued: 2005-03-02

A method for transmission of data packets through a network

(Original Assignee) Siemens AG; Nokia Siemens Networks GmbH and Co KG     (Current Assignee) Nokia Solutions and Networks GmbH and Co KG

Miguel De Vega Rodrigo, Robert Dr. Pleich
US8190610B2
CLAIM 1
. A method of processing data of a data set over a distributed system , wherein the data set comprises a plurality of data groups , the method comprising : partitioning the data of each one of the data groups into a plurality of data partitions that each have a plurality of key-value pairs and providing each data partition to a selected one of a plurality of mapping functions that are each user-configurable to independently output a plurality of lists of values for each of a set of keys found in such map function' ;

s corresponding data partition to form corresponding intermediate data for that data group and identifiable to that data group , wherein the data of a first data (time t) group has a different schema than the data of a second data (said time) group and the data of the first data group is mapped differently than the data of the second data group so that different lists of values are output for the corresponding different intermediate data , wherein the different schema and corresponding different intermediate data have a key in common ;

and reducing the intermediate data for the data groups to at least one output data group , including processing the intermediate data for each data group in a manner that is defined to correspond to that data group , so as to result in a merging of the corresponding different intermediate data based on the key in common , wherein the mapping and reducing operations are performed by a distributed system .
EP1511232A1
CLAIM 4
The method as claimed in claim 1 , 2 or 3 , characterized in that , the answer is evaluated in the edge node and after expiring of a random time t (first data, first data group) he header is sent again to the central node , in order to try a new reservation for a transmission of the burst or packet .

EP1511232A1
CLAIM 5
The method as claimed in claim 1 , 2 or 3 , characterized in that , in case of a reservation conflict the central node determines a time or a time lag for an occupancy of the central node and sends an answer , where said time (second data, second data group) or time lag is contained , to the originating edge node , where said answer is evaluated , the sending of the burst or packet is abandoned , the burst or packet is stored in the edge node and after expiring of said time or time lag a header is sent again to the central node , to try a new reservation for a transmission of said burst or packet .

US8190610B2
CLAIM 17
. A computer system including a plurality of computing devices , the computer system configured to process data of a data set , wherein the data set comprises a plurality of data groups , the computer system comprises at least one processor and memory that are operable to perform the following operations : partitioning the data of each one of the data groups into a plurality of data partitions that each have a plurality of key-value pairs and providing each data partition to a selected one of a plurality of mapping functions that are each user-configurable to independently output a plurality of lists of values for each of a set of keys found in such map function' ;

s corresponding data partition to form corresponding intermediate data for that data group and identifiable to that data group , wherein the data of a first data (time t) group has a different schema than the data of a second data (said time) group and the data of the first data group is mapped differently than the data of the second data group so that different lists of values are output for the corresponding different intermediate data , wherein the different schema and corresponding different intermediate data have a key in common ;

and reduce the intermediate data for the data groups to at least one output data group , including processing the intermediate data for each data group in a manner that is defined to correspond to that data group so as to result in a merging of the corresponding different intermediate data based on the key in common .
EP1511232A1
CLAIM 4
The method as claimed in claim 1 , 2 or 3 , characterized in that , the answer is evaluated in the edge node and after expiring of a random time t (first data, first data group) he header is sent again to the central node , in order to try a new reservation for a transmission of the burst or packet .

EP1511232A1
CLAIM 5
The method as claimed in claim 1 , 2 or 3 , characterized in that , in case of a reservation conflict the central node determines a time or a time lag for an occupancy of the central node and sends an answer , where said time (second data, second data group) or time lag is contained , to the originating edge node , where said answer is evaluated , the sending of the burst or packet is abandoned , the burst or packet is stored in the edge node and after expiring of said time or time lag a header is sent again to the central node , to try a new reservation for a transmission of said burst or packet .

US8190610B2
CLAIM 33
. A map-reduce method of processing data from a plurality of groups having different schema over a computer system , the method comprising : for a first data (time t) set having a plurality of first key-value pairs , wherein such first data set belongs to a first data group and the first key-value pairs have a first schema , partitioning the first data set into a plurality of data partitions and providing each data partition to a selected one of a plurality of mapping functions that are each user-configurable to independently output a plurality of lists of values for each of a set of keys found in such map function' ;

s corresponding data partition to form a first intermediate data set having a first set of resulting key-value pairs ;

for a second data (said time) set having a plurality of second key-value pairs , wherein such second data set belongs to a second data group and the second key-value pairs have a second schema , partitioning the first data set into a plurality of data partitions and providing each data partition to a selected one of a plurality of mapping functions to form a second intermediate data set having a second set of resulting key-value pairs differing from the first set of resulting key-value pairs , wherein the first schema differs from the second schema , wherein the first and second schema and the first and second set of resulting key-value pairs have a key in common ;

and reducing the first and the second intermediate data set together so as to form an output data set , wherein the reducing is accomplished by iterating on at least one key from the first and second intermediate data set , and the output data set has a different schema than the first and second schema , wherein the mapping and reducing operations are performed by a distributed system .
EP1511232A1
CLAIM 4
The method as claimed in claim 1 , 2 or 3 , characterized in that , the answer is evaluated in the edge node and after expiring of a random time t (first data, first data group) he header is sent again to the central node , in order to try a new reservation for a transmission of the burst or packet .

EP1511232A1
CLAIM 5
The method as claimed in claim 1 , 2 or 3 , characterized in that , in case of a reservation conflict the central node determines a time or a time lag for an occupancy of the central node and sends an answer , where said time (second data, second data group) or time lag is contained , to the originating edge node , where said answer is evaluated , the sending of the burst or packet is abandoned , the burst or packet is stored in the edge node and after expiring of said time or time lag a header is sent again to the central node , to try a new reservation for a transmission of said burst or packet .

US8190610B2
CLAIM 40
. A computer system including a plurality of computing devices , the computer system configured to process data of a data set , wherein the data set comprises a plurality of data groups , the computer system comprises at least one processor and memory that are operable to perform the following operations : for a first data (time t) set having a plurality of first key-value pairs , wherein such first data set belongs to a first data group and the first key-value pairs have a first schema , partitioning the first data set into a plurality of data partitions and providing each data partition to a selected one of a plurality of mapping functions that are each user-configurable to independently output a plurality of lists of values for each of a set of keys found in such map function' ;

s corresponding data partition to form a first intermediate data set having a first set of resulting key-value pairs ;

for a second data (said time) set having a plurality of second key-value pairs , wherein such second data set belongs to a second data group and the second key-value pairs have a second schema , partitioning the first data set into a plurality of data partitions and providing each data partition to a selected one of a plurality of mapping functions to form a second intermediate data set having a second set of resulting key-value pairs differing from the first set of resulting key-value pairs , wherein the first schema differs from the second schema ;

and reducing the first and the second intermediate data set together so as to form an output data set , wherein the reducing is accomplished by iterating on at least one key from the first and second intermediate data set , and the output data set has a different schema than the first and second schema , wherein the first and second schema and the first and second set of resulting key-value pairs have a key in common .
EP1511232A1
CLAIM 4
The method as claimed in claim 1 , 2 or 3 , characterized in that , the answer is evaluated in the edge node and after expiring of a random time t (first data, first data group) he header is sent again to the central node , in order to try a new reservation for a transmission of the burst or packet .

EP1511232A1
CLAIM 5
The method as claimed in claim 1 , 2 or 3 , characterized in that , in case of a reservation conflict the central node determines a time or a time lag for an occupancy of the central node and sends an answer , where said time (second data, second data group) or time lag is contained , to the originating edge node , where said answer is evaluated , the sending of the burst or packet is abandoned , the burst or packet is stored in the edge node and after expiring of said time or time lag a header is sent again to the central node , to try a new reservation for a transmission of said burst or packet .




US8190610B2

Filed: 2006-10-05     Issued: 2012-05-29

MapReduce for distributed database processing

(Original Assignee) Yahoo Inc     (Current Assignee) R2 Solutions LLC ; Altaba Inc

Ali Dasdan, Hung-Chih Yang, Ruey-Lung Hsiao
JP2005025303A

Filed: 2003-06-30     Issued: 2005-01-27

データベース分割格納管理装置、方法及びプログラム

(Original Assignee) Hitachi Ltd; 株式会社日立製作所     

Yukio Nakano, Hironori Sugimoto, 幸生 中野, 裕紀 杉本
US8190610B2
CLAIM 1
. A method of processing data (コンピュ) of a data set over a distributed system , wherein the data set comprises a plurality of data groups (NAS) , the method comprising : partitioning the data of each one of the data groups into a plurality of data partitions that each have a plurality of key-value pairs and providing each data partition to a selected one of a plurality of mapping functions that are each user-configurable to independently output a plurality of lists of values for each of a set of keys found in such map function' ;

s corresponding data partition to form corresponding intermediate data for that data group and identifiable to that data group , wherein the data of a first data group has a different schema than the data of a second data group and the data of the first data group is mapped differently than the data of the second data group so that different lists of values are output for the corresponding different intermediate data (のプログラム) , wherein the different schema and corresponding different intermediate data have a key in common ;

and reducing the intermediate data for the data groups to at least one output data group , including processing the intermediate data for each data group in a manner that is defined to correspond to that data group , so as to result in a merging of the corresponding different intermediate data based on the key in common , wherein the mapping and reducing operations are performed by a distributed system .
JP2005025303A
CLAIM 5
コンピュ (processing data) ータに、請求項1から4までのいずれか1項に記載のデータベース分割格納管理処理を実行させるためのプログラム (corresponding different intermediate data)

JP2005025303A
CLAIM 8
複数の格納領域に表データを分割して格納することができるデータベースシステムであって、NAS (data groups, output data groups) 装置により構成され表を格納したデータベースと、該データベースと関連付けされ該データベースを管理するデータベース管理システムと、前記データベース内の表の分割を前記データベース管理システムに指示する指示制御部であって、前記NAS装置に設けられた第1指示制御部と、前記データベース管理システムに設けられた第2指示制御部と、を有し、 前記第2指示制御部は、表定義時又は表定義変更時に、データの分割条件と分割変更契機と分割変更方法とを前記データベース管理システムから取得し、前記第1指示制御部は、前記分割変更契機の発生を監視し、前記分割変更契機を検知した際に、前記分割変更方法に基づいて分割変更を実行する制御を行うことを特徴とするデータベースシステム。

US8190610B2
CLAIM 2
. The method of claim 1 , wherein : the at least one output data group is a plurality of output data groups (NAS) .
JP2005025303A
CLAIM 8
複数の格納領域に表データを分割して格納することができるデータベースシステムであって、NAS (data groups, output data groups) 装置により構成され表を格納したデータベースと、該データベースと関連付けされ該データベースを管理するデータベース管理システムと、前記データベース内の表の分割を前記データベース管理システムに指示する指示制御部であって、前記NAS装置に設けられた第1指示制御部と、前記データベース管理システムに設けられた第2指示制御部と、を有し、 前記第2指示制御部は、表定義時又は表定義変更時に、データの分割条件と分割変更契機と分割変更方法とを前記データベース管理システムから取得し、前記第1指示制御部は、前記分割変更契機の発生を監視し、前記分割変更契機を検知した際に、前記分割変更方法に基づいて分割変更を実行する制御を行うことを特徴とするデータベースシステム。

US8190610B2
CLAIM 13
. The method of claim 1 , wherein : the intermediate data processing step of the reducing step further comprises processing data (コンピュ) that is not intermediate data .
JP2005025303A
CLAIM 5
コンピュ (processing data) ータに、請求項1から4までのいずれか1項に記載のデータベース分割格納管理処理を実行させるためのプログラム。

US8190610B2
CLAIM 16
. The method of claim 1 wherein the reducing step includes relating the data among the plurality of data groups (NAS) .
JP2005025303A
CLAIM 8
複数の格納領域に表データを分割して格納することができるデータベースシステムであって、NAS (data groups, output data groups) 装置により構成され表を格納したデータベースと、該データベースと関連付けされ該データベースを管理するデータベース管理システムと、前記データベース内の表の分割を前記データベース管理システムに指示する指示制御部であって、前記NAS装置に設けられた第1指示制御部と、前記データベース管理システムに設けられた第2指示制御部と、を有し、 前記第2指示制御部は、表定義時又は表定義変更時に、データの分割条件と分割変更契機と分割変更方法とを前記データベース管理システムから取得し、前記第1指示制御部は、前記分割変更契機の発生を監視し、前記分割変更契機を検知した際に、前記分割変更方法に基づいて分割変更を実行する制御を行うことを特徴とするデータベースシステム。

US8190610B2
CLAIM 17
. A computer system (行うこと) including a plurality of computing devices , the computer system configured to process data of a data set , wherein the data set comprises a plurality of data groups (NAS) , the computer system comprises at least one processor and memory that are operable to perform the following operations : partitioning the data of each one of the data groups into a plurality of data partitions that each have a plurality of key-value pairs and providing each data partition to a selected one of a plurality of mapping functions that are each user-configurable to independently output a plurality of lists of values for each of a set of keys found in such map function' ;

s corresponding data partition to form corresponding intermediate data for that data group and identifiable to that data group , wherein the data of a first data group has a different schema than the data of a second data group and the data of the first data group is mapped differently than the data of the second data group so that different lists of values are output for the corresponding different intermediate data (のプログラム) , wherein the different schema and corresponding different intermediate data have a key in common ;

and reduce the intermediate data for the data groups to at least one output data group , including processing the intermediate data for each data group in a manner that is defined to correspond to that data group so as to result in a merging of the corresponding different intermediate data based on the key in common .
JP2005025303A
CLAIM 5
コンピュータに、請求項1から4までのいずれか1項に記載のデータベース分割格納管理処理を実行させるためのプログラム (corresponding different intermediate data)

JP2005025303A
CLAIM 6
複数の格納領域に表データを分割して格納することができるデータベースシステムであって、表を格納したデータベースと、該データベースを管理するデータベース管理システムと、前記データベース内の表の分割を前記データベース管理システムに指示する指示制御部とを有し、 前記指示制御部は、表定義時又は表定義変更時に、データの分割範囲を相対値で指定した条件とその初期値を含む分割条件と分割変更契機と前記分割条件に指定した分割範囲相対値の初期値の変更値である分割変更方法とを前記データベース管理システムから取得し、前記分割変更契機の発生を監視し、前記分割変更契機を検知した際に、前記分割変更方法に基づいて分割変更を実行する制御を行うこと (computer system) を特徴とするデータベースシステム。

JP2005025303A
CLAIM 8
複数の格納領域に表データを分割して格納することができるデータベースシステムであって、NAS (data groups, output data groups) 装置により構成され表を格納したデータベースと、該データベースと関連付けされ該データベースを管理するデータベース管理システムと、前記データベース内の表の分割を前記データベース管理システムに指示する指示制御部であって、前記NAS装置に設けられた第1指示制御部と、前記データベース管理システムに設けられた第2指示制御部と、を有し、 前記第2指示制御部は、表定義時又は表定義変更時に、データの分割条件と分割変更契機と分割変更方法とを前記データベース管理システムから取得し、前記第1指示制御部は、前記分割変更契機の発生を監視し、前記分割変更契機を検知した際に、前記分割変更方法に基づいて分割変更を実行する制御を行うことを特徴とするデータベースシステム。

US8190610B2
CLAIM 18
. The computer system (行うこと) of claim 17 , wherein : the at least one output data group is a plurality of output data groups (NAS) .
JP2005025303A
CLAIM 6
複数の格納領域に表データを分割して格納することができるデータベースシステムであって、表を格納したデータベースと、該データベースを管理するデータベース管理システムと、前記データベース内の表の分割を前記データベース管理システムに指示する指示制御部とを有し、 前記指示制御部は、表定義時又は表定義変更時に、データの分割範囲を相対値で指定した条件とその初期値を含む分割条件と分割変更契機と前記分割条件に指定した分割範囲相対値の初期値の変更値である分割変更方法とを前記データベース管理システムから取得し、前記分割変更契機の発生を監視し、前記分割変更契機を検知した際に、前記分割変更方法に基づいて分割変更を実行する制御を行うこと (computer system) を特徴とするデータベースシステム。

JP2005025303A
CLAIM 8
複数の格納領域に表データを分割して格納することができるデータベースシステムであって、NAS (data groups, output data groups) 装置により構成され表を格納したデータベースと、該データベースと関連付けされ該データベースを管理するデータベース管理システムと、前記データベース内の表の分割を前記データベース管理システムに指示する指示制御部であって、前記NAS装置に設けられた第1指示制御部と、前記データベース管理システムに設けられた第2指示制御部と、を有し、 前記第2指示制御部は、表定義時又は表定義変更時に、データの分割条件と分割変更契機と分割変更方法とを前記データベース管理システムから取得し、前記第1指示制御部は、前記分割変更契機の発生を監視し、前記分割変更契機を検知した際に、前記分割変更方法に基づいて分割変更を実行する制御を行うことを特徴とするデータベースシステム。

US8190610B2
CLAIM 19
. The computer system (行うこと) of claim 17 , wherein : corresponding intermediate data for a data group being identifiable to the data group includes the corresponding intermediate data being stored such that metadata associated with the manner of storage is identifiable to the data group .
JP2005025303A
CLAIM 6
複数の格納領域に表データを分割して格納することができるデータベースシステムであって、表を格納したデータベースと、該データベースを管理するデータベース管理システムと、前記データベース内の表の分割を前記データベース管理システムに指示する指示制御部とを有し、 前記指示制御部は、表定義時又は表定義変更時に、データの分割範囲を相対値で指定した条件とその初期値を含む分割条件と分割変更契機と前記分割条件に指定した分割範囲相対値の初期値の変更値である分割変更方法とを前記データベース管理システムから取得し、前記分割変更契機の発生を監視し、前記分割変更契機を検知した際に、前記分割変更方法に基づいて分割変更を実行する制御を行うこと (computer system) を特徴とするデータベースシステム。

US8190610B2
CLAIM 20
. The computer system (行うこと) of claim 19 , wherein : corresponding intermediate data for a data group being identifiable to the data group includes the computer system being configured to store the corresponding intermediate data in association with data that is identifiable to the data group .
JP2005025303A
CLAIM 6
複数の格納領域に表データを分割して格納することができるデータベースシステムであって、表を格納したデータベースと、該データベースを管理するデータベース管理システムと、前記データベース内の表の分割を前記データベース管理システムに指示する指示制御部とを有し、 前記指示制御部は、表定義時又は表定義変更時に、データの分割範囲を相対値で指定した条件とその初期値を含む分割条件と分割変更契機と前記分割条件に指定した分割範囲相対値の初期値の変更値である分割変更方法とを前記データベース管理システムから取得し、前記分割変更契機の発生を監視し、前記分割変更契機を検知した際に、前記分割変更方法に基づいて分割変更を実行する制御を行うこと (computer system) を特徴とするデータベースシステム。

US8190610B2
CLAIM 21
. The computer system (行うこと) of claim 17 , wherein : the at least one processor and memory being configured to process the intermediate data for each data group in a manner that is defined to correspond to that data group includes , for each data group , the at least one processor and memory being configured to employ an iterator that corresponds to that data group .
JP2005025303A
CLAIM 6
複数の格納領域に表データを分割して格納することができるデータベースシステムであって、表を格納したデータベースと、該データベースを管理するデータベース管理システムと、前記データベース内の表の分割を前記データベース管理システムに指示する指示制御部とを有し、 前記指示制御部は、表定義時又は表定義変更時に、データの分割範囲を相対値で指定した条件とその初期値を含む分割条件と分割変更契機と前記分割条件に指定した分割範囲相対値の初期値の変更値である分割変更方法とを前記データベース管理システムから取得し、前記分割変更契機の発生を監視し、前記分割変更契機を検知した際に、前記分割変更方法に基づいて分割変更を実行する制御を行うこと (computer system) を特徴とするデータベースシステム。

US8190610B2
CLAIM 22
. The computer system (行うこと) of claim 21 , wherein : the intermediate data includes a plurality of grouped sets of key/value pairs ;

the at least one processor and memory are further operable to reduce the intermediate data via a plurality of reducers ;

for at least one of the reducers , the iterator corresponding to a particular data group , for that reducer , is configured to operate according to a different key of a different schema than the iterator corresponding to another particular data group , for that reducer .
JP2005025303A
CLAIM 6
複数の格納領域に表データを分割して格納することができるデータベースシステムであって、表を格納したデータベースと、該データベースを管理するデータベース管理システムと、前記データベース内の表の分割を前記データベース管理システムに指示する指示制御部とを有し、 前記指示制御部は、表定義時又は表定義変更時に、データの分割範囲を相対値で指定した条件とその初期値を含む分割条件と分割変更契機と前記分割条件に指定した分割範囲相対値の初期値の変更値である分割変更方法とを前記データベース管理システムから取得し、前記分割変更契機の発生を監視し、前記分割変更契機を検知した際に、前記分割変更方法に基づいて分割変更を実行する制御を行うこと (computer system) を特徴とするデータベースシステム。

US8190610B2
CLAIM 23
. The computer system (行うこと) of claim 17 , wherein : the intermediate data includes a plurality of grouped sets of key/value pairs ;

the at least one processor and memory are further operable to reduce the intermediate data via a plurality of reducers ;

and the at least one processor and memory are further operable to : partition the intermediate data into a plurality of partitions , to provide each key/value pair of the intermediate data to a separate one of the partitions ;

and provide the intermediate data of each partition to a separate one of the reducers .
JP2005025303A
CLAIM 6
複数の格納領域に表データを分割して格納することができるデータベースシステムであって、表を格納したデータベースと、該データベースを管理するデータベース管理システムと、前記データベース内の表の分割を前記データベース管理システムに指示する指示制御部とを有し、 前記指示制御部は、表定義時又は表定義変更時に、データの分割範囲を相対値で指定した条件とその初期値を含む分割条件と分割変更契機と前記分割条件に指定した分割範囲相対値の初期値の変更値である分割変更方法とを前記データベース管理システムから取得し、前記分割変更契機の発生を監視し、前記分割変更契機を検知した際に、前記分割変更方法に基づいて分割変更を実行する制御を行うこと (computer system) を特徴とするデータベースシステム。

US8190610B2
CLAIM 24
. The computer system (行うこと) of claim 17 , wherein : the intermediate data includes a plurality of grouped sets of key/value pairs ;

the at least one processor and memory are further operable to reduce the intermediate data via a plurality of reducers ;

and the at least one processor and memory are further operable to : partition the intermediate data into a plurality of partitions , to provide at least some of the key/value pairs of the intermediate data to more than one of the partitions ;

and provide the intermediate data of each partition to a separate one of the reducers .
JP2005025303A
CLAIM 6
複数の格納領域に表データを分割して格納することができるデータベースシステムであって、表を格納したデータベースと、該データベースを管理するデータベース管理システムと、前記データベース内の表の分割を前記データベース管理システムに指示する指示制御部とを有し、 前記指示制御部は、表定義時又は表定義変更時に、データの分割範囲を相対値で指定した条件とその初期値を含む分割条件と分割変更契機と前記分割条件に指定した分割範囲相対値の初期値の変更値である分割変更方法とを前記データベース管理システムから取得し、前記分割変更契機の発生を監視し、前記分割変更契機を検知した際に、前記分割変更方法に基づいて分割変更を実行する制御を行うこと (computer system) を特徴とするデータベースシステム。

US8190610B2
CLAIM 25
. The computer system (行うこと) of claim 24 , wherein : the at least one processor and memory are further operable to , in partitioning , to provide all of the key/value pairs of the intermediate data to all of the partitions .
JP2005025303A
CLAIM 6
複数の格納領域に表データを分割して格納することができるデータベースシステムであって、表を格納したデータベースと、該データベースを管理するデータベース管理システムと、前記データベース内の表の分割を前記データベース管理システムに指示する指示制御部とを有し、 前記指示制御部は、表定義時又は表定義変更時に、データの分割範囲を相対値で指定した条件とその初期値を含む分割条件と分割変更契機と前記分割条件に指定した分割範囲相対値の初期値の変更値である分割変更方法とを前記データベース管理システムから取得し、前記分割変更契機の発生を監視し、前記分割変更契機を検知した際に、前記分割変更方法に基づいて分割変更を実行する制御を行うこと (computer system) を特徴とするデータベースシステム。

US8190610B2
CLAIM 26
. The computer system (行うこと) of claim 24 , wherein : at least some of the reducers include a sort , group-by-key and combine task ;

the at least one processor and memory are further operable to generate and provide metadata for at least some of the mapping , partitioning , combining , grouping and sorting .
JP2005025303A
CLAIM 6
複数の格納領域に表データを分割して格納することができるデータベースシステムであって、表を格納したデータベースと、該データベースを管理するデータベース管理システムと、前記データベース内の表の分割を前記データベース管理システムに指示する指示制御部とを有し、 前記指示制御部は、表定義時又は表定義変更時に、データの分割範囲を相対値で指定した条件とその初期値を含む分割条件と分割変更契機と前記分割条件に指定した分割範囲相対値の初期値の変更値である分割変更方法とを前記データベース管理システムから取得し、前記分割変更契機の発生を監視し、前記分割変更契機を検知した際に、前記分割変更方法に基づいて分割変更を実行する制御を行うこと (computer system) を特徴とするデータベースシステム。

US8190610B2
CLAIM 27
. The computer system (行うこと) of claim 26 , wherein : the reducing includes processing the metadata .
JP2005025303A
CLAIM 6
複数の格納領域に表データを分割して格納することができるデータベースシステムであって、表を格納したデータベースと、該データベースを管理するデータベース管理システムと、前記データベース内の表の分割を前記データベース管理システムに指示する指示制御部とを有し、 前記指示制御部は、表定義時又は表定義変更時に、データの分割範囲を相対値で指定した条件とその初期値を含む分割条件と分割変更契機と前記分割条件に指定した分割範囲相対値の初期値の変更値である分割変更方法とを前記データベース管理システムから取得し、前記分割変更契機の発生を監視し、前記分割変更契機を検知した際に、前記分割変更方法に基づいて分割変更を実行する制御を行うこと (computer system) を特徴とするデータベースシステム。

US8190610B2
CLAIM 28
. The computer system (行うこと) of claim 27 , wherein : the at least one processor and memory being further operable to process the intermediate data for each data group in a manner that is defined to correspond to that data group includes , for each data group , the at least one processor and memory being further operable to employ an iterator that corresponds to that data group , wherein the iterator is configured to provide the associated metadata to the reducing .
JP2005025303A
CLAIM 6
複数の格納領域に表データを分割して格納することができるデータベースシステムであって、表を格納したデータベースと、該データベースを管理するデータベース管理システムと、前記データベース内の表の分割を前記データベース管理システムに指示する指示制御部とを有し、 前記指示制御部は、表定義時又は表定義変更時に、データの分割範囲を相対値で指定した条件とその初期値を含む分割条件と分割変更契機と前記分割条件に指定した分割範囲相対値の初期値の変更値である分割変更方法とを前記データベース管理システムから取得し、前記分割変更契機の発生を監視し、前記分割変更契機を検知した際に、前記分割変更方法に基づいて分割変更を実行する制御を行うこと (computer system) を特徴とするデータベースシステム。

US8190610B2
CLAIM 29
. The computer system (行うこと) of claim 17 , wherein : the intermediate data processing of the reducing further comprises processing data (コンピュ) that is not intermediate data .
JP2005025303A
CLAIM 5
コンピュ (processing data) ータに、請求項1から4までのいずれか1項に記載のデータベース分割格納管理処理を実行させるためのプログラム。

JP2005025303A
CLAIM 6
複数の格納領域に表データを分割して格納することができるデータベースシステムであって、表を格納したデータベースと、該データベースを管理するデータベース管理システムと、前記データベース内の表の分割を前記データベース管理システムに指示する指示制御部とを有し、 前記指示制御部は、表定義時又は表定義変更時に、データの分割範囲を相対値で指定した条件とその初期値を含む分割条件と分割変更契機と前記分割条件に指定した分割範囲相対値の初期値の変更値である分割変更方法とを前記データベース管理システムから取得し、前記分割変更契機の発生を監視し、前記分割変更契機を検知した際に、前記分割変更方法に基づいて分割変更を実行する制御を行うこと (computer system) を特徴とするデータベースシステム。

US8190610B2
CLAIM 30
. The computer system (行うこと) of claim 29 , wherein : the at least one processor and memory are further operable to reduce the intermediate data via a plurality of reducers ;

and the data that is not intermediate data , for a particular reducer , includes data that is associated with another reducer .
JP2005025303A
CLAIM 6
複数の格納領域に表データを分割して格納することができるデータベースシステムであって、表を格納したデータベースと、該データベースを管理するデータベース管理システムと、前記データベース内の表の分割を前記データベース管理システムに指示する指示制御部とを有し、 前記指示制御部は、表定義時又は表定義変更時に、データの分割範囲を相対値で指定した条件とその初期値を含む分割条件と分割変更契機と前記分割条件に指定した分割範囲相対値の初期値の変更値である分割変更方法とを前記データベース管理システムから取得し、前記分割変更契機の発生を監視し、前記分割変更契機を検知した際に、前記分割変更方法に基づいて分割変更を実行する制御を行うこと (computer system) を特徴とするデータベースシステム。

US8190610B2
CLAIM 31
. The computer system (行うこと) of claim 29 , wherein : the at least one processor and memory are further operable to reduce the intermediate data via a plurality of reducers ;

and the data that is not intermediate data , for a particular reducer , includes data that is associated with that reducer .
JP2005025303A
CLAIM 6
複数の格納領域に表データを分割して格納することができるデータベースシステムであって、表を格納したデータベースと、該データベースを管理するデータベース管理システムと、前記データベース内の表の分割を前記データベース管理システムに指示する指示制御部とを有し、 前記指示制御部は、表定義時又は表定義変更時に、データの分割範囲を相対値で指定した条件とその初期値を含む分割条件と分割変更契機と前記分割条件に指定した分割範囲相対値の初期値の変更値である分割変更方法とを前記データベース管理システムから取得し、前記分割変更契機の発生を監視し、前記分割変更契機を検知した際に、前記分割変更方法に基づいて分割変更を実行する制御を行うこと (computer system) を特徴とするデータベースシステム。

US8190610B2
CLAIM 32
. The computer system (行うこと) of claim 17 , wherein the at least one processor and memory are further operable to reduce by relating the data among the plurality of data groups (NAS) .
JP2005025303A
CLAIM 6
複数の格納領域に表データを分割して格納することができるデータベースシステムであって、表を格納したデータベースと、該データベースを管理するデータベース管理システムと、前記データベース内の表の分割を前記データベース管理システムに指示する指示制御部とを有し、 前記指示制御部は、表定義時又は表定義変更時に、データの分割範囲を相対値で指定した条件とその初期値を含む分割条件と分割変更契機と前記分割条件に指定した分割範囲相対値の初期値の変更値である分割変更方法とを前記データベース管理システムから取得し、前記分割変更契機の発生を監視し、前記分割変更契機を検知した際に、前記分割変更方法に基づいて分割変更を実行する制御を行うこと (computer system) を特徴とするデータベースシステム。

JP2005025303A
CLAIM 8
複数の格納領域に表データを分割して格納することができるデータベースシステムであって、NAS (data groups, output data groups) 装置により構成され表を格納したデータベースと、該データベースと関連付けされ該データベースを管理するデータベース管理システムと、前記データベース内の表の分割を前記データベース管理システムに指示する指示制御部であって、前記NAS装置に設けられた第1指示制御部と、前記データベース管理システムに設けられた第2指示制御部と、を有し、 前記第2指示制御部は、表定義時又は表定義変更時に、データの分割条件と分割変更契機と分割変更方法とを前記データベース管理システムから取得し、前記第1指示制御部は、前記分割変更契機の発生を監視し、前記分割変更契機を検知した際に、前記分割変更方法に基づいて分割変更を実行する制御を行うことを特徴とするデータベースシステム。

US8190610B2
CLAIM 33
. A map-reduce method of processing data (コンピュ) from a plurality of groups having different schema over a computer system (行うこと) , the method comprising : for a first data set having a plurality of first key-value pairs , wherein such first data set belongs to a first data group and the first key-value pairs have a first schema , partitioning the first data set into a plurality of data partitions and providing each data partition to a selected one of a plurality of mapping functions that are each user-configurable to independently output a plurality of lists of values for each of a set of keys found in such map function' ;

s corresponding data partition to form a first intermediate data set having a first set of resulting key-value pairs ;

for a second data set having a plurality of second key-value pairs , wherein such second data set belongs to a second data group and the second key-value pairs have a second schema , partitioning the first data set into a plurality of data partitions and providing each data partition to a selected one of a plurality of mapping functions to form a second intermediate data set having a second set of resulting key-value pairs differing from the first set of resulting key-value pairs , wherein the first schema differs from the second schema , wherein the first and second schema and the first and second set of resulting key-value pairs have a key in common ;

and reducing the first and the second intermediate data set together so as to form an output data set , wherein the reducing is accomplished by iterating on at least one key from the first and second intermediate data set , and the output data set has a different schema than the first and second schema , wherein the mapping and reducing operations are performed by a distributed system .
JP2005025303A
CLAIM 5
コンピュ (processing data) ータに、請求項1から4までのいずれか1項に記載のデータベース分割格納管理処理を実行させるためのプログラム。

JP2005025303A
CLAIM 6
複数の格納領域に表データを分割して格納することができるデータベースシステムであって、表を格納したデータベースと、該データベースを管理するデータベース管理システムと、前記データベース内の表の分割を前記データベース管理システムに指示する指示制御部とを有し、 前記指示制御部は、表定義時又は表定義変更時に、データの分割範囲を相対値で指定した条件とその初期値を含む分割条件と分割変更契機と前記分割条件に指定した分割範囲相対値の初期値の変更値である分割変更方法とを前記データベース管理システムから取得し、前記分割変更契機の発生を監視し、前記分割変更契機を検知した際に、前記分割変更方法に基づいて分割変更を実行する制御を行うこと (computer system) を特徴とするデータベースシステム。

US8190610B2
CLAIM 40
. A computer system (行うこと) including a plurality of computing devices , the computer system configured to process data of a data set , wherein the data set comprises a plurality of data groups (NAS) , the computer system comprises at least one processor and memory that are operable to perform the following operations : for a first data set having a plurality of first key-value pairs , wherein such first data set belongs to a first data group and the first key-value pairs have a first schema , partitioning the first data set into a plurality of data partitions and providing each data partition to a selected one of a plurality of mapping functions that are each user-configurable to independently output a plurality of lists of values for each of a set of keys found in such map function' ;

s corresponding data partition to form a first intermediate data set having a first set of resulting key-value pairs ;

for a second data set having a plurality of second key-value pairs , wherein such second data set belongs to a second data group and the second key-value pairs have a second schema , partitioning the first data set into a plurality of data partitions and providing each data partition to a selected one of a plurality of mapping functions to form a second intermediate data set having a second set of resulting key-value pairs differing from the first set of resulting key-value pairs , wherein the first schema differs from the second schema ;

and reducing the first and the second intermediate data set together so as to form an output data set , wherein the reducing is accomplished by iterating on at least one key from the first and second intermediate data set , and the output data set has a different schema than the first and second schema , wherein the first and second schema and the first and second set of resulting key-value pairs have a key in common .
JP2005025303A
CLAIM 6
複数の格納領域に表データを分割して格納することができるデータベースシステムであって、表を格納したデータベースと、該データベースを管理するデータベース管理システムと、前記データベース内の表の分割を前記データベース管理システムに指示する指示制御部とを有し、 前記指示制御部は、表定義時又は表定義変更時に、データの分割範囲を相対値で指定した条件とその初期値を含む分割条件と分割変更契機と前記分割条件に指定した分割範囲相対値の初期値の変更値である分割変更方法とを前記データベース管理システムから取得し、前記分割変更契機の発生を監視し、前記分割変更契機を検知した際に、前記分割変更方法に基づいて分割変更を実行する制御を行うこと (computer system) を特徴とするデータベースシステム。

JP2005025303A
CLAIM 8
複数の格納領域に表データを分割して格納することができるデータベースシステムであって、NAS (data groups, output data groups) 装置により構成され表を格納したデータベースと、該データベースと関連付けされ該データベースを管理するデータベース管理システムと、前記データベース内の表の分割を前記データベース管理システムに指示する指示制御部であって、前記NAS装置に設けられた第1指示制御部と、前記データベース管理システムに設けられた第2指示制御部と、を有し、 前記第2指示制御部は、表定義時又は表定義変更時に、データの分割条件と分割変更契機と分割変更方法とを前記データベース管理システムから取得し、前記第1指示制御部は、前記分割変更契機の発生を監視し、前記分割変更契機を検知した際に、前記分割変更方法に基づいて分割変更を実行する制御を行うことを特徴とするデータベースシステム。

US8190610B2
CLAIM 41
. The computer system (行うこと) of claim 40 , wherein the reducing is accomplished by iterating on at least one key from each of the first and second intermediate data set so that the output data set is a merging of a portion of the first and second intermediate data set .
JP2005025303A
CLAIM 6
複数の格納領域に表データを分割して格納することができるデータベースシステムであって、表を格納したデータベースと、該データベースを管理するデータベース管理システムと、前記データベース内の表の分割を前記データベース管理システムに指示する指示制御部とを有し、 前記指示制御部は、表定義時又は表定義変更時に、データの分割範囲を相対値で指定した条件とその初期値を含む分割条件と分割変更契機と前記分割条件に指定した分割範囲相対値の初期値の変更値である分割変更方法とを前記データベース管理システムから取得し、前記分割変更契機の発生を監視し、前記分割変更契機を検知した際に、前記分割変更方法に基づいて分割変更を実行する制御を行うこと (computer system) を特徴とするデータベースシステム。

US8190610B2
CLAIM 42
. The computer system (行うこと) of claim 40 , wherein the reducing is accomplished by a plurality of reducers , the d the at least one processor and memory further operable for : partitioning the first or second intermediate data set into a plurality of partitions , at least some of the key-value pairs of the first or second intermediate data being provided to more than one of the partitions ;

and providing the partitioned intermediate data set of each partition to a separate one of the reducers .
JP2005025303A
CLAIM 6
複数の格納領域に表データを分割して格納することができるデータベースシステムであって、表を格納したデータベースと、該データベースを管理するデータベース管理システムと、前記データベース内の表の分割を前記データベース管理システムに指示する指示制御部とを有し、 前記指示制御部は、表定義時又は表定義変更時に、データの分割範囲を相対値で指定した条件とその初期値を含む分割条件と分割変更契機と前記分割条件に指定した分割範囲相対値の初期値の変更値である分割変更方法とを前記データベース管理システムから取得し、前記分割変更契機の発生を監視し、前記分割変更契機を検知した際に、前記分割変更方法に基づいて分割変更を実行する制御を行うこと (computer system) を特徴とするデータベースシステム。

US8190610B2
CLAIM 43
. The computer system (行うこと) of claim 42 , wherein all of the key-value pairs of the first and second intermediate data set are provided to all of the reducers .
JP2005025303A
CLAIM 6
複数の格納領域に表データを分割して格納することができるデータベースシステムであって、表を格納したデータベースと、該データベースを管理するデータベース管理システムと、前記データベース内の表の分割を前記データベース管理システムに指示する指示制御部とを有し、 前記指示制御部は、表定義時又は表定義変更時に、データの分割範囲を相対値で指定した条件とその初期値を含む分割条件と分割変更契機と前記分割条件に指定した分割範囲相対値の初期値の変更値である分割変更方法とを前記データベース管理システムから取得し、前記分割変更契機の発生を監視し、前記分割変更契機を検知した際に、前記分割変更方法に基づいて分割変更を実行する制御を行うこと (computer system) を特徴とするデータベースシステム。

US8190610B2
CLAIM 44
. The computer system (行うこと) of claim 42 , wherein at least some of the reducers include a sort , group-by-key and combine task , the d the at least one processor and memory are further operable for : generating and providing metadata for at least some of the mapping , partitioning , combining , grouping and sorting .
JP2005025303A
CLAIM 6
複数の格納領域に表データを分割して格納することができるデータベースシステムであって、表を格納したデータベースと、該データベースを管理するデータベース管理システムと、前記データベース内の表の分割を前記データベース管理システムに指示する指示制御部とを有し、 前記指示制御部は、表定義時又は表定義変更時に、データの分割範囲を相対値で指定した条件とその初期値を含む分割条件と分割変更契機と前記分割条件に指定した分割範囲相対値の初期値の変更値である分割変更方法とを前記データベース管理システムから取得し、前記分割変更契機の発生を監視し、前記分割変更契機を検知した際に、前記分割変更方法に基づいて分割変更を実行する制御を行うこと (computer system) を特徴とするデータベースシステム。

US8190610B2
CLAIM 45
. The computer system (行うこと) of claim 44 , wherein the reducing includes processing the metadata .
JP2005025303A
CLAIM 6
複数の格納領域に表データを分割して格納することができるデータベースシステムであって、表を格納したデータベースと、該データベースを管理するデータベース管理システムと、前記データベース内の表の分割を前記データベース管理システムに指示する指示制御部とを有し、 前記指示制御部は、表定義時又は表定義変更時に、データの分割範囲を相対値で指定した条件とその初期値を含む分割条件と分割変更契機と前記分割条件に指定した分割範囲相対値の初期値の変更値である分割変更方法とを前記データベース管理システムから取得し、前記分割変更契機の発生を監視し、前記分割変更契機を検知した際に、前記分割変更方法に基づいて分割変更を実行する制御を行うこと (computer system) を特徴とするデータベースシステム。

US8190610B2
CLAIM 46
. The computer system (行うこと) of claim 45 , wherein iterating includes providing the associated metadata to the processing of the reducing step .
JP2005025303A
CLAIM 6
複数の格納領域に表データを分割して格納することができるデータベースシステムであって、表を格納したデータベースと、該データベースを管理するデータベース管理システムと、前記データベース内の表の分割を前記データベース管理システムに指示する指示制御部とを有し、 前記指示制御部は、表定義時又は表定義変更時に、データの分割範囲を相対値で指定した条件とその初期値を含む分割条件と分割変更契機と前記分割条件に指定した分割範囲相対値の初期値の変更値である分割変更方法とを前記データベース管理システムから取得し、前記分割変更契機の発生を監視し、前記分割変更契機を検知した際に、前記分割変更方法に基づいて分割変更を実行する制御を行うこと (computer system) を特徴とするデータベースシステム。




US8190610B2

Filed: 2006-10-05     Issued: 2012-05-29

MapReduce for distributed database processing

(Original Assignee) Yahoo Inc     (Current Assignee) R2 Solutions LLC ; Altaba Inc

Ali Dasdan, Hung-Chih Yang, Ruey-Lung Hsiao
US20040036716A1

Filed: 2003-06-12     Issued: 2004-02-26

Data storage, retrieval, manipulation and display tools enabling multiple hierarchical points of view

(Original Assignee) Jordahl Jena J.     (Current Assignee) GLOBAL CONNECT TECHNOLOGY Inc

Jena Jordahl
US8190610B2
CLAIM 1
. A method of processing data of a data set (confidence levels, data sets) over a distributed system , wherein the data set comprises a plurality of data groups , the method comprising : partitioning the data of each one of the data groups into a plurality of data partitions that each have a plurality of key-value pairs and providing each data partition to a selected one of a plurality of mapping functions that are each user-configurable to independently output a plurality of lists of values for each of a set of keys found in such map function' ;

s corresponding data partition to form corresponding intermediate data for that data group and identifiable to that data group , wherein the data of a first data group (weighting function) has a different schema than the data of a second data group and the data of the first data group is mapped differently than the data of the second data group so that different lists of values are output for the corresponding different intermediate data , wherein the different schema and corresponding different intermediate data have a key in common ;

and reducing the intermediate data for the data groups to at least one output data group , including processing the intermediate data for each data group in a manner that is defined to correspond to that data group , so as to result in a merging of the corresponding different intermediate data based on the key in common , wherein the mapping and reducing operations are performed by a distributed system .
US20040036716A1
CLAIM 1
. A computer program tangibly stored on a computer-readable medium and operable to cause a computer to enable a user interface for encoding a graphical representation of a point of view , the computer program comprising instructions to : receive input from a user defining the graphical representation ;
manipulate the graphical representation to obtain a hierarchy based on the point of view ;
encode a structure of the hierarchy into data structures ;
obtain significance and interest relations for components of the hierarchy based on the structure of the hierarchy ;
apply the significance and interest relations to the data structures to obtain connectivity data ;
determine confidence levels (first set, second set, data set, output data set) for data sets (first set, second set, data set, output data set) related to the components of the hierarchy based on the connectivity data ;
and present the confidence levels and data sets to the user in the context of the hierarchy .

US20040036716A1
CLAIM 9
. The computer program of claim 1 , wherein the instructions to obtain significance and interest relations include instructions to : calculate significance and interest values based on at least one of locations of elements of the point of view in the graphical representation and positioning of connections between elements of the point of view in the graphical representation ;
set thresholds for the confidence levels for inclusion of members of the data sets based on the significance values calculated ;
and set weighting function (first data group) s for the elements based on the interest values calculated .

US8190610B2
CLAIM 17
. A computer system including a plurality of computing devices , the computer system configured to process data of a data set (confidence levels, data sets) , wherein the data set comprises a plurality of data groups , the computer system comprises at least one processor and memory that are operable to perform the following operations : partitioning the data of each one of the data groups into a plurality of data partitions that each have a plurality of key-value pairs and providing each data partition to a selected one of a plurality of mapping functions that are each user-configurable to independently output a plurality of lists of values for each of a set of keys found in such map function' ;

s corresponding data partition to form corresponding intermediate data for that data group and identifiable to that data group , wherein the data of a first data group (weighting function) has a different schema than the data of a second data group and the data of the first data group is mapped differently than the data of the second data group so that different lists of values are output for the corresponding different intermediate data , wherein the different schema and corresponding different intermediate data have a key in common ;

and reduce the intermediate data for the data groups to at least one output data group , including processing the intermediate data for each data group in a manner that is defined to correspond to that data group so as to result in a merging of the corresponding different intermediate data based on the key in common .
US20040036716A1
CLAIM 1
. A computer program tangibly stored on a computer-readable medium and operable to cause a computer to enable a user interface for encoding a graphical representation of a point of view , the computer program comprising instructions to : receive input from a user defining the graphical representation ;
manipulate the graphical representation to obtain a hierarchy based on the point of view ;
encode a structure of the hierarchy into data structures ;
obtain significance and interest relations for components of the hierarchy based on the structure of the hierarchy ;
apply the significance and interest relations to the data structures to obtain connectivity data ;
determine confidence levels (first set, second set, data set, output data set) for data sets (first set, second set, data set, output data set) related to the components of the hierarchy based on the connectivity data ;
and present the confidence levels and data sets to the user in the context of the hierarchy .

US20040036716A1
CLAIM 9
. The computer program of claim 1 , wherein the instructions to obtain significance and interest relations include instructions to : calculate significance and interest values based on at least one of locations of elements of the point of view in the graphical representation and positioning of connections between elements of the point of view in the graphical representation ;
set thresholds for the confidence levels for inclusion of members of the data sets based on the significance values calculated ;
and set weighting function (first data group) s for the elements based on the interest values calculated .

US8190610B2
CLAIM 33
. A map-reduce method of processing data from a plurality of groups having different schema over a computer system , the method comprising : for a first data set (confidence levels, data sets) having a plurality of first key-value pairs , wherein such first data set belongs to a first data group (weighting function) and the first key-value pairs have a first schema , partitioning the first data set into a plurality of data partitions and providing each data partition to a selected one of a plurality of mapping functions that are each user-configurable to independently output a plurality of lists of values for each of a set of keys found in such map function' ;

s corresponding data partition to form a first intermediate data set having a first set (confidence levels, data sets) of resulting key-value pairs ;

for a second data set having a plurality of second key-value pairs , wherein such second data set belongs to a second data group and the second key-value pairs have a second schema , partitioning the first data set into a plurality of data partitions and providing each data partition to a selected one of a plurality of mapping functions to form a second intermediate data set having a second set (confidence levels, data sets) of resulting key-value pairs differing from the first set of resulting key-value pairs , wherein the first schema differs from the second schema , wherein the first and second schema and the first and second set of resulting key-value pairs have a key in common ;

and reducing the first and the second intermediate data set together so as to form an output data set , wherein the reducing is accomplished by iterating on at least one key from the first and second intermediate data set , and the output data set has a different schema than the first and second schema , wherein the mapping and reducing operations are performed by a distributed system .
US20040036716A1
CLAIM 1
. A computer program tangibly stored on a computer-readable medium and operable to cause a computer to enable a user interface for encoding a graphical representation of a point of view , the computer program comprising instructions to : receive input from a user defining the graphical representation ;
manipulate the graphical representation to obtain a hierarchy based on the point of view ;
encode a structure of the hierarchy into data structures ;
obtain significance and interest relations for components of the hierarchy based on the structure of the hierarchy ;
apply the significance and interest relations to the data structures to obtain connectivity data ;
determine confidence levels (first set, second set, data set, output data set) for data sets (first set, second set, data set, output data set) related to the components of the hierarchy based on the connectivity data ;
and present the confidence levels and data sets to the user in the context of the hierarchy .

US20040036716A1
CLAIM 9
. The computer program of claim 1 , wherein the instructions to obtain significance and interest relations include instructions to : calculate significance and interest values based on at least one of locations of elements of the point of view in the graphical representation and positioning of connections between elements of the point of view in the graphical representation ;
set thresholds for the confidence levels for inclusion of members of the data sets based on the significance values calculated ;
and set weighting function (first data group) s for the elements based on the interest values calculated .

US8190610B2
CLAIM 34
. The map-reduce method of claim 33 , wherein the reducing is accomplished by iterating on at least one key from each of the first and second intermediate data set (confidence levels, data sets) so that the output data set is a merging of a portion of the first and second intermediate data set .
US20040036716A1
CLAIM 1
. A computer program tangibly stored on a computer-readable medium and operable to cause a computer to enable a user interface for encoding a graphical representation of a point of view , the computer program comprising instructions to : receive input from a user defining the graphical representation ;
manipulate the graphical representation to obtain a hierarchy based on the point of view ;
encode a structure of the hierarchy into data structures ;
obtain significance and interest relations for components of the hierarchy based on the structure of the hierarchy ;
apply the significance and interest relations to the data structures to obtain connectivity data ;
determine confidence levels (first set, second set, data set, output data set) for data sets (first set, second set, data set, output data set) related to the components of the hierarchy based on the connectivity data ;
and present the confidence levels and data sets to the user in the context of the hierarchy .

US8190610B2
CLAIM 35
. The map-reduce method of claim 33 , wherein the reducing is accomplished by a plurality of reducers , the method further comprises : partitioning the first or second intermediate data set (confidence levels, data sets) into a plurality of partitions , at least some of the key-value pairs of the first or second intermediate data being provided to more than one of the partitions ;

and providing the partitioned intermediate data set of each partition to a separate one of the reducers .
US20040036716A1
CLAIM 1
. A computer program tangibly stored on a computer-readable medium and operable to cause a computer to enable a user interface for encoding a graphical representation of a point of view , the computer program comprising instructions to : receive input from a user defining the graphical representation ;
manipulate the graphical representation to obtain a hierarchy based on the point of view ;
encode a structure of the hierarchy into data structures ;
obtain significance and interest relations for components of the hierarchy based on the structure of the hierarchy ;
apply the significance and interest relations to the data structures to obtain connectivity data ;
determine confidence levels (first set, second set, data set, output data set) for data sets (first set, second set, data set, output data set) related to the components of the hierarchy based on the connectivity data ;
and present the confidence levels and data sets to the user in the context of the hierarchy .

US8190610B2
CLAIM 36
. The map-reduce method of claim 35 , wherein all of the key-value pairs of the first and second intermediate data set (confidence levels, data sets) are provided to all of the reducers .
US20040036716A1
CLAIM 1
. A computer program tangibly stored on a computer-readable medium and operable to cause a computer to enable a user interface for encoding a graphical representation of a point of view , the computer program comprising instructions to : receive input from a user defining the graphical representation ;
manipulate the graphical representation to obtain a hierarchy based on the point of view ;
encode a structure of the hierarchy into data structures ;
obtain significance and interest relations for components of the hierarchy based on the structure of the hierarchy ;
apply the significance and interest relations to the data structures to obtain connectivity data ;
determine confidence levels (first set, second set, data set, output data set) for data sets (first set, second set, data set, output data set) related to the components of the hierarchy based on the connectivity data ;
and present the confidence levels and data sets to the user in the context of the hierarchy .

US8190610B2
CLAIM 40
. A computer system including a plurality of computing devices , the computer system configured to process data of a data set (confidence levels, data sets) , wherein the data set comprises a plurality of data groups , the computer system comprises at least one processor and memory that are operable to perform the following operations : for a first data set having a plurality of first key-value pairs , wherein such first data set belongs to a first data group (weighting function) and the first key-value pairs have a first schema , partitioning the first data set into a plurality of data partitions and providing each data partition to a selected one of a plurality of mapping functions that are each user-configurable to independently output a plurality of lists of values for each of a set of keys found in such map function' ;

s corresponding data partition to form a first intermediate data set having a first set (confidence levels, data sets) of resulting key-value pairs ;

for a second data set having a plurality of second key-value pairs , wherein such second data set belongs to a second data group and the second key-value pairs have a second schema , partitioning the first data set into a plurality of data partitions and providing each data partition to a selected one of a plurality of mapping functions to form a second intermediate data set having a second set (confidence levels, data sets) of resulting key-value pairs differing from the first set of resulting key-value pairs , wherein the first schema differs from the second schema ;

and reducing the first and the second intermediate data set together so as to form an output data set , wherein the reducing is accomplished by iterating on at least one key from the first and second intermediate data set , and the output data set has a different schema than the first and second schema , wherein the first and second schema and the first and second set of resulting key-value pairs have a key in common .
US20040036716A1
CLAIM 1
. A computer program tangibly stored on a computer-readable medium and operable to cause a computer to enable a user interface for encoding a graphical representation of a point of view , the computer program comprising instructions to : receive input from a user defining the graphical representation ;
manipulate the graphical representation to obtain a hierarchy based on the point of view ;
encode a structure of the hierarchy into data structures ;
obtain significance and interest relations for components of the hierarchy based on the structure of the hierarchy ;
apply the significance and interest relations to the data structures to obtain connectivity data ;
determine confidence levels (first set, second set, data set, output data set) for data sets (first set, second set, data set, output data set) related to the components of the hierarchy based on the connectivity data ;
and present the confidence levels and data sets to the user in the context of the hierarchy .

US20040036716A1
CLAIM 9
. The computer program of claim 1 , wherein the instructions to obtain significance and interest relations include instructions to : calculate significance and interest values based on at least one of locations of elements of the point of view in the graphical representation and positioning of connections between elements of the point of view in the graphical representation ;
set thresholds for the confidence levels for inclusion of members of the data sets based on the significance values calculated ;
and set weighting function (first data group) s for the elements based on the interest values calculated .

US8190610B2
CLAIM 41
. The computer system of claim 40 , wherein the reducing is accomplished by iterating on at least one key from each of the first and second intermediate data set (confidence levels, data sets) so that the output data set is a merging of a portion of the first and second intermediate data set .
US20040036716A1
CLAIM 1
. A computer program tangibly stored on a computer-readable medium and operable to cause a computer to enable a user interface for encoding a graphical representation of a point of view , the computer program comprising instructions to : receive input from a user defining the graphical representation ;
manipulate the graphical representation to obtain a hierarchy based on the point of view ;
encode a structure of the hierarchy into data structures ;
obtain significance and interest relations for components of the hierarchy based on the structure of the hierarchy ;
apply the significance and interest relations to the data structures to obtain connectivity data ;
determine confidence levels (first set, second set, data set, output data set) for data sets (first set, second set, data set, output data set) related to the components of the hierarchy based on the connectivity data ;
and present the confidence levels and data sets to the user in the context of the hierarchy .

US8190610B2
CLAIM 42
. The computer system of claim 40 , wherein the reducing is accomplished by a plurality of reducers , the d the at least one processor and memory further operable for : partitioning the first or second intermediate data set (confidence levels, data sets) into a plurality of partitions , at least some of the key-value pairs of the first or second intermediate data being provided to more than one of the partitions ;

and providing the partitioned intermediate data set of each partition to a separate one of the reducers .
US20040036716A1
CLAIM 1
. A computer program tangibly stored on a computer-readable medium and operable to cause a computer to enable a user interface for encoding a graphical representation of a point of view , the computer program comprising instructions to : receive input from a user defining the graphical representation ;
manipulate the graphical representation to obtain a hierarchy based on the point of view ;
encode a structure of the hierarchy into data structures ;
obtain significance and interest relations for components of the hierarchy based on the structure of the hierarchy ;
apply the significance and interest relations to the data structures to obtain connectivity data ;
determine confidence levels (first set, second set, data set, output data set) for data sets (first set, second set, data set, output data set) related to the components of the hierarchy based on the connectivity data ;
and present the confidence levels and data sets to the user in the context of the hierarchy .

US8190610B2
CLAIM 43
. The computer system of claim 42 , wherein all of the key-value pairs of the first and second intermediate data set (confidence levels, data sets) are provided to all of the reducers .
US20040036716A1
CLAIM 1
. A computer program tangibly stored on a computer-readable medium and operable to cause a computer to enable a user interface for encoding a graphical representation of a point of view , the computer program comprising instructions to : receive input from a user defining the graphical representation ;
manipulate the graphical representation to obtain a hierarchy based on the point of view ;
encode a structure of the hierarchy into data structures ;
obtain significance and interest relations for components of the hierarchy based on the structure of the hierarchy ;
apply the significance and interest relations to the data structures to obtain connectivity data ;
determine confidence levels (first set, second set, data set, output data set) for data sets (first set, second set, data set, output data set) related to the components of the hierarchy based on the connectivity data ;
and present the confidence levels and data sets to the user in the context of the hierarchy .




US8190610B2

Filed: 2006-10-05     Issued: 2012-05-29

MapReduce for distributed database processing

(Original Assignee) Yahoo Inc     (Current Assignee) R2 Solutions LLC ; Altaba Inc

Ali Dasdan, Hung-Chih Yang, Ruey-Lung Hsiao
JP2004362449A

Filed: 2003-06-06     Issued: 2004-12-24

サービス提供装置及びサービスコーディネータ装置及びサービス提供方法及びサービスコーディネート方法及びプログラム及びプログラムを記録したコンピュータ読み取り可能な記録媒体

(Original Assignee) Mitsubishi Electric Corp; 三菱電機株式会社     

Yuji Aoki, 裕司 青木
US8190610B2
CLAIM 1
. A method of processing data (コンピュ) of a data set over a distributed system , wherein the data set comprises a plurality of data groups , the method comprising : partitioning the data of each one of the data groups into a plurality of data partitions that each have a plurality of key-value pairs and providing each data partition to a selected one of a plurality of mapping functions that are each user-configurable to independently output a plurality of lists of values for each of a set of keys found in such map function' ;

s corresponding data partition to form corresponding intermediate data for that data group and identifiable to that data group , wherein the data of a first data group has a different schema than the data of a second data group and the data of the first data group is mapped differently than the data of the second data group so that different lists of values are output for the corresponding different intermediate data (のプログラム) , wherein the different schema and corresponding different intermediate data have a key in common ;

and reducing the intermediate data for the data groups to at least one output data group , including processing the intermediate data for each data group in a manner that is defined to correspond to that data group , so as to result in a merging of the corresponding different intermediate data based on the key in common , wherein the mapping and reducing operations are performed by a distributed system .
JP2004362449A
CLAIM 9
複数のサービスのサービス提供条件を示す複数のサービス提供条件情報を記憶装置に記憶させる記憶処理と、 上記複数のサービスの内、いくつかをグループとして組合わせたサービスグループの各サービスを提供することを求めるリクエスト情報を受信する受信処理と、 上記サービスグループの各サービスの現在の提供状況を管理し、上記記憶処理により記憶された複数のサービス提供条件情報が示すサービス提供条件に基づいて、上記受信処理により受信されたリクエスト情報が求めるサービスグループの各サービスを提供することが可能かどうかを判断する管理処理と、 上記管理処理により判断された結果に基づいて、上記受信処理により受信されたリクエスト情報が求めるサービスグループの各サービスの提供を実行する実行処理と をコンピュ (processing data) ータに実行させるためのプログラム (corresponding different intermediate data) 又は上記プログラムを記録したコンピュータ読み取り可能な記録媒体。

US8190610B2
CLAIM 13
. The method of claim 1 , wherein : the intermediate data processing step of the reducing step further comprises processing data (コンピュ) that is not intermediate data .
JP2004362449A
CLAIM 9
複数のサービスのサービス提供条件を示す複数のサービス提供条件情報を記憶装置に記憶させる記憶処理と、 上記複数のサービスの内、いくつかをグループとして組合わせたサービスグループの各サービスを提供することを求めるリクエスト情報を受信する受信処理と、 上記サービスグループの各サービスの現在の提供状況を管理し、上記記憶処理により記憶された複数のサービス提供条件情報が示すサービス提供条件に基づいて、上記受信処理により受信されたリクエスト情報が求めるサービスグループの各サービスを提供することが可能かどうかを判断する管理処理と、 上記管理処理により判断された結果に基づいて、上記受信処理により受信されたリクエスト情報が求めるサービスグループの各サービスの提供を実行する実行処理と をコンピュ (processing data) ータに実行させるためのプログラム又は上記プログラムを記録したコンピュータ読み取り可能な記録媒体。

US8190610B2
CLAIM 17
. A computer system including a plurality of computing devices , the computer system configured to process data of a data set , wherein the data set comprises a plurality of data groups , the computer system comprises at least one processor and memory that are operable to perform the following operations : partitioning the data of each one of the data groups into a plurality of data partitions that each have a plurality of key-value pairs and providing each data partition to a selected one of a plurality of mapping functions that are each user-configurable to independently output a plurality of lists of values for each of a set of keys found in such map function' ;

s corresponding data partition to form corresponding intermediate data for that data group and identifiable to that data group , wherein the data of a first data group has a different schema than the data of a second data group and the data of the first data group is mapped differently than the data of the second data group so that different lists of values are output for the corresponding different intermediate data (のプログラム) , wherein the different schema and corresponding different intermediate data have a key in common ;

and reduce the intermediate data for the data groups to at least one output data group , including processing the intermediate data for each data group in a manner that is defined to correspond to that data group so as to result in a merging of the corresponding different intermediate data based on the key in common .
JP2004362449A
CLAIM 9
複数のサービスのサービス提供条件を示す複数のサービス提供条件情報を記憶装置に記憶させる記憶処理と、 上記複数のサービスの内、いくつかをグループとして組合わせたサービスグループの各サービスを提供することを求めるリクエスト情報を受信する受信処理と、 上記サービスグループの各サービスの現在の提供状況を管理し、上記記憶処理により記憶された複数のサービス提供条件情報が示すサービス提供条件に基づいて、上記受信処理により受信されたリクエスト情報が求めるサービスグループの各サービスを提供することが可能かどうかを判断する管理処理と、 上記管理処理により判断された結果に基づいて、上記受信処理により受信されたリクエスト情報が求めるサービスグループの各サービスの提供を実行する実行処理と をコンピュータに実行させるためのプログラム (corresponding different intermediate data) 又は上記プログラムを記録したコンピュータ読み取り可能な記録媒体。

US8190610B2
CLAIM 29
. The computer system of claim 17 , wherein : the intermediate data processing of the reducing further comprises processing data (コンピュ) that is not intermediate data .
JP2004362449A
CLAIM 9
複数のサービスのサービス提供条件を示す複数のサービス提供条件情報を記憶装置に記憶させる記憶処理と、 上記複数のサービスの内、いくつかをグループとして組合わせたサービスグループの各サービスを提供することを求めるリクエスト情報を受信する受信処理と、 上記サービスグループの各サービスの現在の提供状況を管理し、上記記憶処理により記憶された複数のサービス提供条件情報が示すサービス提供条件に基づいて、上記受信処理により受信されたリクエスト情報が求めるサービスグループの各サービスを提供することが可能かどうかを判断する管理処理と、 上記管理処理により判断された結果に基づいて、上記受信処理により受信されたリクエスト情報が求めるサービスグループの各サービスの提供を実行する実行処理と をコンピュ (processing data) ータに実行させるためのプログラム又は上記プログラムを記録したコンピュータ読み取り可能な記録媒体。

US8190610B2
CLAIM 33
. A map-reduce method of processing data (コンピュ) from a plurality of groups having different schema over a computer system , the method comprising : for a first data set having a plurality of first key-value pairs , wherein such first data set belongs to a first data group and the first key-value pairs have a first schema , partitioning the first data set into a plurality of data partitions and providing each data partition to a selected one of a plurality of mapping functions that are each user-configurable to independently output a plurality of lists of values for each of a set of keys found in such map function' ;

s corresponding data partition to form a first intermediate data set having a first set of resulting key-value pairs ;

for a second data set having a plurality of second key-value pairs , wherein such second data set belongs to a second data group and the second key-value pairs have a second schema , partitioning the first data set into a plurality of data partitions and providing each data partition to a selected one of a plurality of mapping functions to form a second intermediate data (記憶部) set having a second set of resulting key-value pairs differing from the first set of resulting key-value pairs , wherein the first schema differs from the second schema , wherein the first and second schema and the first and second set of resulting key-value pairs have a key in common ;

and reducing the first and the second intermediate data set together so as to form an output data set , wherein the reducing is accomplished by iterating on at least one key from the first and second intermediate data set , and the output data set has a different schema than the first and second schema , wherein the mapping and reducing operations are performed by a distributed system .
JP2004362449A
CLAIM 1
複数のサービスのサービス提供条件を示す複数のサービス提供条件情報を記憶する記憶部 (second intermediate data, second intermediate data set) と、 上記複数のサービスの内、いくつかをグループとして組合わせたサービスグループの各サービスを提供することを求めるリクエスト情報を受信する受信部と、 上記サービスグループの各サービスの現在の提供状況を管理し、上記記憶部により記憶された複数のサービス提供条件情報が示すサービス提供条件に基づいて、上記受信部により受信されたリクエスト情報が求めるサービスグループの各サービスを提供することが可能かどうかを判断する管理部と、 上記管理部により判断された結果に基づいて、上記受信部により受信されたリクエスト情報が求めるサービスグループの各サービスの提供を実行する実行部とを備えたことを特徴とするサービス提供装置。

JP2004362449A
CLAIM 9
複数のサービスのサービス提供条件を示す複数のサービス提供条件情報を記憶装置に記憶させる記憶処理と、 上記複数のサービスの内、いくつかをグループとして組合わせたサービスグループの各サービスを提供することを求めるリクエスト情報を受信する受信処理と、 上記サービスグループの各サービスの現在の提供状況を管理し、上記記憶処理により記憶された複数のサービス提供条件情報が示すサービス提供条件に基づいて、上記受信処理により受信されたリクエスト情報が求めるサービスグループの各サービスを提供することが可能かどうかを判断する管理処理と、 上記管理処理により判断された結果に基づいて、上記受信処理により受信されたリクエスト情報が求めるサービスグループの各サービスの提供を実行する実行処理と をコンピュ (processing data) ータに実行させるためのプログラム又は上記プログラムを記録したコンピュータ読み取り可能な記録媒体。

US8190610B2
CLAIM 34
. The map-reduce method of claim 33 , wherein the reducing is accomplished by iterating on at least one key from each of the first and second intermediate data (記憶部) set so that the output data set is a merging of a portion of the first and second intermediate data set .
JP2004362449A
CLAIM 1
複数のサービスのサービス提供条件を示す複数のサービス提供条件情報を記憶する記憶部 (second intermediate data, second intermediate data set) と、 上記複数のサービスの内、いくつかをグループとして組合わせたサービスグループの各サービスを提供することを求めるリクエスト情報を受信する受信部と、 上記サービスグループの各サービスの現在の提供状況を管理し、上記記憶部により記憶された複数のサービス提供条件情報が示すサービス提供条件に基づいて、上記受信部により受信されたリクエスト情報が求めるサービスグループの各サービスを提供することが可能かどうかを判断する管理部と、 上記管理部により判断された結果に基づいて、上記受信部により受信されたリクエスト情報が求めるサービスグループの各サービスの提供を実行する実行部とを備えたことを特徴とするサービス提供装置。

US8190610B2
CLAIM 35
. The map-reduce method of claim 33 , wherein the reducing is accomplished by a plurality of reducers , the method further comprises : partitioning the first or second intermediate data (記憶部) set into a plurality of partitions , at least some of the key-value pairs of the first or second intermediate data being provided to more than one of the partitions ;

and providing the partitioned intermediate data set of each partition to a separate one of the reducers .
JP2004362449A
CLAIM 1
複数のサービスのサービス提供条件を示す複数のサービス提供条件情報を記憶する記憶部 (second intermediate data, second intermediate data set) と、 上記複数のサービスの内、いくつかをグループとして組合わせたサービスグループの各サービスを提供することを求めるリクエスト情報を受信する受信部と、 上記サービスグループの各サービスの現在の提供状況を管理し、上記記憶部により記憶された複数のサービス提供条件情報が示すサービス提供条件に基づいて、上記受信部により受信されたリクエスト情報が求めるサービスグループの各サービスを提供することが可能かどうかを判断する管理部と、 上記管理部により判断された結果に基づいて、上記受信部により受信されたリクエスト情報が求めるサービスグループの各サービスの提供を実行する実行部とを備えたことを特徴とするサービス提供装置。

US8190610B2
CLAIM 36
. The map-reduce method of claim 35 , wherein all of the key-value pairs of the first and second intermediate data (記憶部) set are provided to all of the reducers .
JP2004362449A
CLAIM 1
複数のサービスのサービス提供条件を示す複数のサービス提供条件情報を記憶する記憶部 (second intermediate data, second intermediate data set) と、 上記複数のサービスの内、いくつかをグループとして組合わせたサービスグループの各サービスを提供することを求めるリクエスト情報を受信する受信部と、 上記サービスグループの各サービスの現在の提供状況を管理し、上記記憶部により記憶された複数のサービス提供条件情報が示すサービス提供条件に基づいて、上記受信部により受信されたリクエスト情報が求めるサービスグループの各サービスを提供することが可能かどうかを判断する管理部と、 上記管理部により判断された結果に基づいて、上記受信部により受信されたリクエスト情報が求めるサービスグループの各サービスの提供を実行する実行部とを備えたことを特徴とするサービス提供装置。

US8190610B2
CLAIM 40
. A computer system including a plurality of computing devices , the computer system configured to process data of a data set , wherein the data set comprises a plurality of data groups , the computer system comprises at least one processor and memory that are operable to perform the following operations : for a first data set having a plurality of first key-value pairs , wherein such first data set belongs to a first data group and the first key-value pairs have a first schema , partitioning the first data set into a plurality of data partitions and providing each data partition to a selected one of a plurality of mapping functions that are each user-configurable to independently output a plurality of lists of values for each of a set of keys found in such map function' ;

s corresponding data partition to form a first intermediate data set having a first set of resulting key-value pairs ;

for a second data set having a plurality of second key-value pairs , wherein such second data set belongs to a second data group and the second key-value pairs have a second schema , partitioning the first data set into a plurality of data partitions and providing each data partition to a selected one of a plurality of mapping functions to form a second intermediate data (記憶部) set having a second set of resulting key-value pairs differing from the first set of resulting key-value pairs , wherein the first schema differs from the second schema ;

and reducing the first and the second intermediate data set together so as to form an output data set , wherein the reducing is accomplished by iterating on at least one key from the first and second intermediate data set , and the output data set has a different schema than the first and second schema , wherein the first and second schema and the first and second set of resulting key-value pairs have a key in common .
JP2004362449A
CLAIM 1
複数のサービスのサービス提供条件を示す複数のサービス提供条件情報を記憶する記憶部 (second intermediate data, second intermediate data set) と、 上記複数のサービスの内、いくつかをグループとして組合わせたサービスグループの各サービスを提供することを求めるリクエスト情報を受信する受信部と、 上記サービスグループの各サービスの現在の提供状況を管理し、上記記憶部により記憶された複数のサービス提供条件情報が示すサービス提供条件に基づいて、上記受信部により受信されたリクエスト情報が求めるサービスグループの各サービスを提供することが可能かどうかを判断する管理部と、 上記管理部により判断された結果に基づいて、上記受信部により受信されたリクエスト情報が求めるサービスグループの各サービスの提供を実行する実行部とを備えたことを特徴とするサービス提供装置。

US8190610B2
CLAIM 41
. The computer system of claim 40 , wherein the reducing is accomplished by iterating on at least one key from each of the first and second intermediate data (記憶部) set so that the output data set is a merging of a portion of the first and second intermediate data set .
JP2004362449A
CLAIM 1
複数のサービスのサービス提供条件を示す複数のサービス提供条件情報を記憶する記憶部 (second intermediate data, second intermediate data set) と、 上記複数のサービスの内、いくつかをグループとして組合わせたサービスグループの各サービスを提供することを求めるリクエスト情報を受信する受信部と、 上記サービスグループの各サービスの現在の提供状況を管理し、上記記憶部により記憶された複数のサービス提供条件情報が示すサービス提供条件に基づいて、上記受信部により受信されたリクエスト情報が求めるサービスグループの各サービスを提供することが可能かどうかを判断する管理部と、 上記管理部により判断された結果に基づいて、上記受信部により受信されたリクエスト情報が求めるサービスグループの各サービスの提供を実行する実行部とを備えたことを特徴とするサービス提供装置。

US8190610B2
CLAIM 42
. The computer system of claim 40 , wherein the reducing is accomplished by a plurality of reducers , the d the at least one processor and memory further operable for : partitioning the first or second intermediate data (記憶部) set into a plurality of partitions , at least some of the key-value pairs of the first or second intermediate data being provided to more than one of the partitions ;

and providing the partitioned intermediate data set of each partition to a separate one of the reducers .
JP2004362449A
CLAIM 1
複数のサービスのサービス提供条件を示す複数のサービス提供条件情報を記憶する記憶部 (second intermediate data, second intermediate data set) と、 上記複数のサービスの内、いくつかをグループとして組合わせたサービスグループの各サービスを提供することを求めるリクエスト情報を受信する受信部と、 上記サービスグループの各サービスの現在の提供状況を管理し、上記記憶部により記憶された複数のサービス提供条件情報が示すサービス提供条件に基づいて、上記受信部により受信されたリクエスト情報が求めるサービスグループの各サービスを提供することが可能かどうかを判断する管理部と、 上記管理部により判断された結果に基づいて、上記受信部により受信されたリクエスト情報が求めるサービスグループの各サービスの提供を実行する実行部とを備えたことを特徴とするサービス提供装置。

US8190610B2
CLAIM 43
. The computer system of claim 42 , wherein all of the key-value pairs of the first and second intermediate data (記憶部) set are provided to all of the reducers .
JP2004362449A
CLAIM 1
複数のサービスのサービス提供条件を示す複数のサービス提供条件情報を記憶する記憶部 (second intermediate data, second intermediate data set) と、 上記複数のサービスの内、いくつかをグループとして組合わせたサービスグループの各サービスを提供することを求めるリクエスト情報を受信する受信部と、 上記サービスグループの各サービスの現在の提供状況を管理し、上記記憶部により記憶された複数のサービス提供条件情報が示すサービス提供条件に基づいて、上記受信部により受信されたリクエスト情報が求めるサービスグループの各サービスを提供することが可能かどうかを判断する管理部と、 上記管理部により判断された結果に基づいて、上記受信部により受信されたリクエスト情報が求めるサービスグループの各サービスの提供を実行する実行部とを備えたことを特徴とするサービス提供装置。




US8190610B2

Filed: 2006-10-05     Issued: 2012-05-29

MapReduce for distributed database processing

(Original Assignee) Yahoo Inc     (Current Assignee) R2 Solutions LLC ; Altaba Inc

Ali Dasdan, Hung-Chih Yang, Ruey-Lung Hsiao
US20040249789A1

Filed: 2003-06-04     Issued: 2004-12-09

Duplicate data elimination system

(Original Assignee) Microsoft Corp     (Current Assignee) Microsoft Technology Licensing LLC

Rahul Kapoor, Venkatesh Ganti, Surajit Chaudhuri
US8190610B2
CLAIM 17
. A computer system (evaluation data) including a plurality of computing devices , the computer system configured to process data of a data set , wherein the data set comprises a plurality of data groups , the computer system comprises at least one processor and memory that are operable to perform the following operations : partitioning the data of each one of the data groups into a plurality of data partitions that each have a plurality of key-value pairs and providing each data partition to a selected one of a plurality of mapping functions that are each user-configurable to independently output a plurality of lists of values for each of a set of keys found in such map function' ;

s corresponding data partition to form corresponding intermediate data for that data group and identifiable to that data group , wherein the data of a first data group has a different schema than the data of a second data group and the data of the first data group is mapped differently than the data of the second data group so that different lists of values are output for the corresponding different intermediate data , wherein the different schema and corresponding different intermediate data have a key in common ;

and reduce the intermediate data for the data groups to at least one output data group , including processing the intermediate data for each data group in a manner that is defined to correspond to that data group so as to result in a merging of the corresponding different intermediate data based on the key in common .
US20040249789A1
CLAIM 15
. Apparatus for finding a canonical data record for a set of two or more data records comprising : means for providing a reference table having a number of reference records from which canonical data records are identified ;
means for identifying reference table tokens contained within the reference records of the reference table and classifying the reference table tokens according to attribute field ;
and means for assigning a similarity score to evaluation data (computer system) records in the reference table in relation to other records based on a similarity between tokens of said evaluation data records ;
means for grouping together evaluation records whose similarity score with respect to each other is greater than a threshold to form groups of records that form nodes of a graph wherein edges between nodes represent a similarity between records of a group wherein each said group identifying a canonical record based on the similarity of data records to each other within the group .

US8190610B2
CLAIM 18
. The computer system (evaluation data) of claim 17 , wherein : the at least one output data group is a plurality of output data groups .
US20040249789A1
CLAIM 15
. Apparatus for finding a canonical data record for a set of two or more data records comprising : means for providing a reference table having a number of reference records from which canonical data records are identified ;
means for identifying reference table tokens contained within the reference records of the reference table and classifying the reference table tokens according to attribute field ;
and means for assigning a similarity score to evaluation data (computer system) records in the reference table in relation to other records based on a similarity between tokens of said evaluation data records ;
means for grouping together evaluation records whose similarity score with respect to each other is greater than a threshold to form groups of records that form nodes of a graph wherein edges between nodes represent a similarity between records of a group wherein each said group identifying a canonical record based on the similarity of data records to each other within the group .

US8190610B2
CLAIM 19
. The computer system (evaluation data) of claim 17 , wherein : corresponding intermediate data for a data group being identifiable to the data group includes the corresponding intermediate data being stored such that metadata associated with the manner of storage is identifiable to the data group .
US20040249789A1
CLAIM 15
. Apparatus for finding a canonical data record for a set of two or more data records comprising : means for providing a reference table having a number of reference records from which canonical data records are identified ;
means for identifying reference table tokens contained within the reference records of the reference table and classifying the reference table tokens according to attribute field ;
and means for assigning a similarity score to evaluation data (computer system) records in the reference table in relation to other records based on a similarity between tokens of said evaluation data records ;
means for grouping together evaluation records whose similarity score with respect to each other is greater than a threshold to form groups of records that form nodes of a graph wherein edges between nodes represent a similarity between records of a group wherein each said group identifying a canonical record based on the similarity of data records to each other within the group .

US8190610B2
CLAIM 20
. The computer system (evaluation data) of claim 19 , wherein : corresponding intermediate data for a data group being identifiable to the data group includes the computer system being configured to store the corresponding intermediate data in association with data that is identifiable to the data group .
US20040249789A1
CLAIM 15
. Apparatus for finding a canonical data record for a set of two or more data records comprising : means for providing a reference table having a number of reference records from which canonical data records are identified ;
means for identifying reference table tokens contained within the reference records of the reference table and classifying the reference table tokens according to attribute field ;
and means for assigning a similarity score to evaluation data (computer system) records in the reference table in relation to other records based on a similarity between tokens of said evaluation data records ;
means for grouping together evaluation records whose similarity score with respect to each other is greater than a threshold to form groups of records that form nodes of a graph wherein edges between nodes represent a similarity between records of a group wherein each said group identifying a canonical record based on the similarity of data records to each other within the group .

US8190610B2
CLAIM 21
. The computer system (evaluation data) of claim 17 , wherein : the at least one processor and memory being configured to process the intermediate data for each data group in a manner that is defined to correspond to that data group includes , for each data group , the at least one processor and memory being configured to employ an iterator that corresponds to that data group .
US20040249789A1
CLAIM 15
. Apparatus for finding a canonical data record for a set of two or more data records comprising : means for providing a reference table having a number of reference records from which canonical data records are identified ;
means for identifying reference table tokens contained within the reference records of the reference table and classifying the reference table tokens according to attribute field ;
and means for assigning a similarity score to evaluation data (computer system) records in the reference table in relation to other records based on a similarity between tokens of said evaluation data records ;
means for grouping together evaluation records whose similarity score with respect to each other is greater than a threshold to form groups of records that form nodes of a graph wherein edges between nodes represent a similarity between records of a group wherein each said group identifying a canonical record based on the similarity of data records to each other within the group .

US8190610B2
CLAIM 22
. The computer system (evaluation data) of claim 21 , wherein : the intermediate data includes a plurality of grouped sets of key/value pairs ;

the at least one processor and memory are further operable to reduce the intermediate data via a plurality of reducers ;

for at least one of the reducers , the iterator corresponding to a particular data group , for that reducer , is configured to operate according to a different key of a different schema than the iterator corresponding to another particular data group , for that reducer .
US20040249789A1
CLAIM 15
. Apparatus for finding a canonical data record for a set of two or more data records comprising : means for providing a reference table having a number of reference records from which canonical data records are identified ;
means for identifying reference table tokens contained within the reference records of the reference table and classifying the reference table tokens according to attribute field ;
and means for assigning a similarity score to evaluation data (computer system) records in the reference table in relation to other records based on a similarity between tokens of said evaluation data records ;
means for grouping together evaluation records whose similarity score with respect to each other is greater than a threshold to form groups of records that form nodes of a graph wherein edges between nodes represent a similarity between records of a group wherein each said group identifying a canonical record based on the similarity of data records to each other within the group .

US8190610B2
CLAIM 23
. The computer system (evaluation data) of claim 17 , wherein : the intermediate data includes a plurality of grouped sets of key/value pairs ;

the at least one processor and memory are further operable to reduce the intermediate data via a plurality of reducers ;

and the at least one processor and memory are further operable to : partition the intermediate data into a plurality of partitions , to provide each key/value pair of the intermediate data to a separate one of the partitions ;

and provide the intermediate data of each partition to a separate one of the reducers .
US20040249789A1
CLAIM 15
. Apparatus for finding a canonical data record for a set of two or more data records comprising : means for providing a reference table having a number of reference records from which canonical data records are identified ;
means for identifying reference table tokens contained within the reference records of the reference table and classifying the reference table tokens according to attribute field ;
and means for assigning a similarity score to evaluation data (computer system) records in the reference table in relation to other records based on a similarity between tokens of said evaluation data records ;
means for grouping together evaluation records whose similarity score with respect to each other is greater than a threshold to form groups of records that form nodes of a graph wherein edges between nodes represent a similarity between records of a group wherein each said group identifying a canonical record based on the similarity of data records to each other within the group .

US8190610B2
CLAIM 24
. The computer system (evaluation data) of claim 17 , wherein : the intermediate data includes a plurality of grouped sets of key/value pairs ;

the at least one processor and memory are further operable to reduce the intermediate data via a plurality of reducers ;

and the at least one processor and memory are further operable to : partition the intermediate data into a plurality of partitions , to provide at least some of the key/value pairs of the intermediate data to more than one of the partitions ;

and provide the intermediate data of each partition to a separate one of the reducers .
US20040249789A1
CLAIM 15
. Apparatus for finding a canonical data record for a set of two or more data records comprising : means for providing a reference table having a number of reference records from which canonical data records are identified ;
means for identifying reference table tokens contained within the reference records of the reference table and classifying the reference table tokens according to attribute field ;
and means for assigning a similarity score to evaluation data (computer system) records in the reference table in relation to other records based on a similarity between tokens of said evaluation data records ;
means for grouping together evaluation records whose similarity score with respect to each other is greater than a threshold to form groups of records that form nodes of a graph wherein edges between nodes represent a similarity between records of a group wherein each said group identifying a canonical record based on the similarity of data records to each other within the group .

US8190610B2
CLAIM 25
. The computer system (evaluation data) of claim 24 , wherein : the at least one processor and memory are further operable to , in partitioning , to provide all of the key/value pairs of the intermediate data to all of the partitions .
US20040249789A1
CLAIM 15
. Apparatus for finding a canonical data record for a set of two or more data records comprising : means for providing a reference table having a number of reference records from which canonical data records are identified ;
means for identifying reference table tokens contained within the reference records of the reference table and classifying the reference table tokens according to attribute field ;
and means for assigning a similarity score to evaluation data (computer system) records in the reference table in relation to other records based on a similarity between tokens of said evaluation data records ;
means for grouping together evaluation records whose similarity score with respect to each other is greater than a threshold to form groups of records that form nodes of a graph wherein edges between nodes represent a similarity between records of a group wherein each said group identifying a canonical record based on the similarity of data records to each other within the group .

US8190610B2
CLAIM 26
. The computer system (evaluation data) of claim 24 , wherein : at least some of the reducers include a sort , group-by-key and combine task ;

the at least one processor and memory are further operable to generate and provide metadata for at least some of the mapping , partitioning , combining , grouping and sorting .
US20040249789A1
CLAIM 15
. Apparatus for finding a canonical data record for a set of two or more data records comprising : means for providing a reference table having a number of reference records from which canonical data records are identified ;
means for identifying reference table tokens contained within the reference records of the reference table and classifying the reference table tokens according to attribute field ;
and means for assigning a similarity score to evaluation data (computer system) records in the reference table in relation to other records based on a similarity between tokens of said evaluation data records ;
means for grouping together evaluation records whose similarity score with respect to each other is greater than a threshold to form groups of records that form nodes of a graph wherein edges between nodes represent a similarity between records of a group wherein each said group identifying a canonical record based on the similarity of data records to each other within the group .

US8190610B2
CLAIM 27
. The computer system (evaluation data) of claim 26 , wherein : the reducing includes processing the metadata .
US20040249789A1
CLAIM 15
. Apparatus for finding a canonical data record for a set of two or more data records comprising : means for providing a reference table having a number of reference records from which canonical data records are identified ;
means for identifying reference table tokens contained within the reference records of the reference table and classifying the reference table tokens according to attribute field ;
and means for assigning a similarity score to evaluation data (computer system) records in the reference table in relation to other records based on a similarity between tokens of said evaluation data records ;
means for grouping together evaluation records whose similarity score with respect to each other is greater than a threshold to form groups of records that form nodes of a graph wherein edges between nodes represent a similarity between records of a group wherein each said group identifying a canonical record based on the similarity of data records to each other within the group .

US8190610B2
CLAIM 28
. The computer system (evaluation data) of claim 27 , wherein : the at least one processor and memory being further operable to process the intermediate data for each data group in a manner that is defined to correspond to that data group includes , for each data group , the at least one processor and memory being further operable to employ an iterator that corresponds to that data group , wherein the iterator is configured to provide the associated metadata to the reducing .
US20040249789A1
CLAIM 15
. Apparatus for finding a canonical data record for a set of two or more data records comprising : means for providing a reference table having a number of reference records from which canonical data records are identified ;
means for identifying reference table tokens contained within the reference records of the reference table and classifying the reference table tokens according to attribute field ;
and means for assigning a similarity score to evaluation data (computer system) records in the reference table in relation to other records based on a similarity between tokens of said evaluation data records ;
means for grouping together evaluation records whose similarity score with respect to each other is greater than a threshold to form groups of records that form nodes of a graph wherein edges between nodes represent a similarity between records of a group wherein each said group identifying a canonical record based on the similarity of data records to each other within the group .

US8190610B2
CLAIM 29
. The computer system (evaluation data) of claim 17 , wherein : the intermediate data processing of the reducing further comprises processing data that is not intermediate data .
US20040249789A1
CLAIM 15
. Apparatus for finding a canonical data record for a set of two or more data records comprising : means for providing a reference table having a number of reference records from which canonical data records are identified ;
means for identifying reference table tokens contained within the reference records of the reference table and classifying the reference table tokens according to attribute field ;
and means for assigning a similarity score to evaluation data (computer system) records in the reference table in relation to other records based on a similarity between tokens of said evaluation data records ;
means for grouping together evaluation records whose similarity score with respect to each other is greater than a threshold to form groups of records that form nodes of a graph wherein edges between nodes represent a similarity between records of a group wherein each said group identifying a canonical record based on the similarity of data records to each other within the group .

US8190610B2
CLAIM 30
. The computer system (evaluation data) of claim 29 , wherein : the at least one processor and memory are further operable to reduce the intermediate data via a plurality of reducers ;

and the data that is not intermediate data , for a particular reducer , includes data that is associated with another reducer .
US20040249789A1
CLAIM 15
. Apparatus for finding a canonical data record for a set of two or more data records comprising : means for providing a reference table having a number of reference records from which canonical data records are identified ;
means for identifying reference table tokens contained within the reference records of the reference table and classifying the reference table tokens according to attribute field ;
and means for assigning a similarity score to evaluation data (computer system) records in the reference table in relation to other records based on a similarity between tokens of said evaluation data records ;
means for grouping together evaluation records whose similarity score with respect to each other is greater than a threshold to form groups of records that form nodes of a graph wherein edges between nodes represent a similarity between records of a group wherein each said group identifying a canonical record based on the similarity of data records to each other within the group .

US8190610B2
CLAIM 31
. The computer system (evaluation data) of claim 29 , wherein : the at least one processor and memory are further operable to reduce the intermediate data via a plurality of reducers ;

and the data that is not intermediate data , for a particular reducer , includes data that is associated with that reducer .
US20040249789A1
CLAIM 15
. Apparatus for finding a canonical data record for a set of two or more data records comprising : means for providing a reference table having a number of reference records from which canonical data records are identified ;
means for identifying reference table tokens contained within the reference records of the reference table and classifying the reference table tokens according to attribute field ;
and means for assigning a similarity score to evaluation data (computer system) records in the reference table in relation to other records based on a similarity between tokens of said evaluation data records ;
means for grouping together evaluation records whose similarity score with respect to each other is greater than a threshold to form groups of records that form nodes of a graph wherein edges between nodes represent a similarity between records of a group wherein each said group identifying a canonical record based on the similarity of data records to each other within the group .

US8190610B2
CLAIM 32
. The computer system (evaluation data) of claim 17 , wherein the at least one processor and memory are further operable to reduce by relating the data among the plurality of data groups .
US20040249789A1
CLAIM 15
. Apparatus for finding a canonical data record for a set of two or more data records comprising : means for providing a reference table having a number of reference records from which canonical data records are identified ;
means for identifying reference table tokens contained within the reference records of the reference table and classifying the reference table tokens according to attribute field ;
and means for assigning a similarity score to evaluation data (computer system) records in the reference table in relation to other records based on a similarity between tokens of said evaluation data records ;
means for grouping together evaluation records whose similarity score with respect to each other is greater than a threshold to form groups of records that form nodes of a graph wherein edges between nodes represent a similarity between records of a group wherein each said group identifying a canonical record based on the similarity of data records to each other within the group .

US8190610B2
CLAIM 33
. A map-reduce method of processing data from a plurality of groups having different schema over a computer system (evaluation data) , the method comprising : for a first data set having a plurality of first key-value pairs , wherein such first data set belongs to a first data group and the first key-value pairs have a first schema , partitioning the first data set into a plurality of data partitions and providing each data partition to a selected one of a plurality of mapping functions that are each user-configurable to independently output a plurality of lists of values for each of a set of keys found in such map function' ;

s corresponding data partition to form a first intermediate data set having a first set of resulting key-value pairs ;

for a second data set having a plurality of second key-value pairs , wherein such second data set belongs to a second data group and the second key-value pairs have a second schema , partitioning the first data set into a plurality of data partitions and providing each data partition to a selected one of a plurality of mapping functions to form a second intermediate data set having a second set of resulting key-value pairs differing from the first set of resulting key-value pairs , wherein the first schema differs from the second schema , wherein the first and second schema and the first and second set of resulting key-value pairs have a key in common ;

and reducing the first and the second intermediate data set together so as to form an output data set (more data records) , wherein the reducing is accomplished by iterating on at least one key from the first and second intermediate data set , and the output data set has a different schema than the first and second schema , wherein the mapping and reducing operations are performed by a distributed system .
US20040249789A1
CLAIM 11
. A system for process for removing duplicate data records from a set of data records comprising : a database management system containing a number of data records contained in one or more tables from which one or more data records (output data set) are removed ;
a processor for identifying tokens contained within the data records and classifying the tokens according to attribute field and wherein said processor assigns a similarity score to data records in the reference table in relation to other data records based on a similarity between tokens of said data records ;
and wherein said processor groups together data records whose similarity score with respect to each other is greater than a threshold to form one or more groups of data records that form nodes of a graph wherein edges between nodes represent a similarity score between records of a group ;
and then identifies a canonical record based on the similarity of data records to each other within the group .

US20040249789A1
CLAIM 15
. Apparatus for finding a canonical data record for a set of two or more data records comprising : means for providing a reference table having a number of reference records from which canonical data records are identified ;
means for identifying reference table tokens contained within the reference records of the reference table and classifying the reference table tokens according to attribute field ;
and means for assigning a similarity score to evaluation data (computer system) records in the reference table in relation to other records based on a similarity between tokens of said evaluation data records ;
means for grouping together evaluation records whose similarity score with respect to each other is greater than a threshold to form groups of records that form nodes of a graph wherein edges between nodes represent a similarity between records of a group wherein each said group identifying a canonical record based on the similarity of data records to each other within the group .

US8190610B2
CLAIM 34
. The map-reduce method of claim 33 , wherein the reducing is accomplished by iterating on at least one key from each of the first and second intermediate data set so that the output data set (more data records) is a merging of a portion of the first and second intermediate data set .
US20040249789A1
CLAIM 11
. A system for process for removing duplicate data records from a set of data records comprising : a database management system containing a number of data records contained in one or more tables from which one or more data records (output data set) are removed ;
a processor for identifying tokens contained within the data records and classifying the tokens according to attribute field and wherein said processor assigns a similarity score to data records in the reference table in relation to other data records based on a similarity between tokens of said data records ;
and wherein said processor groups together data records whose similarity score with respect to each other is greater than a threshold to form one or more groups of data records that form nodes of a graph wherein edges between nodes represent a similarity score between records of a group ;
and then identifies a canonical record based on the similarity of data records to each other within the group .

US8190610B2
CLAIM 40
. A computer system (evaluation data) including a plurality of computing devices , the computer system configured to process data of a data set , wherein the data set comprises a plurality of data groups , the computer system comprises at least one processor and memory that are operable to perform the following operations : for a first data set having a plurality of first key-value pairs , wherein such first data set belongs to a first data group and the first key-value pairs have a first schema , partitioning the first data set into a plurality of data partitions and providing each data partition to a selected one of a plurality of mapping functions that are each user-configurable to independently output a plurality of lists of values for each of a set of keys found in such map function' ;

s corresponding data partition to form a first intermediate data set having a first set of resulting key-value pairs ;

for a second data set having a plurality of second key-value pairs , wherein such second data set belongs to a second data group and the second key-value pairs have a second schema , partitioning the first data set into a plurality of data partitions and providing each data partition to a selected one of a plurality of mapping functions to form a second intermediate data set having a second set of resulting key-value pairs differing from the first set of resulting key-value pairs , wherein the first schema differs from the second schema ;

and reducing the first and the second intermediate data set together so as to form an output data set (more data records) , wherein the reducing is accomplished by iterating on at least one key from the first and second intermediate data set , and the output data set has a different schema than the first and second schema , wherein the first and second schema and the first and second set of resulting key-value pairs have a key in common .
US20040249789A1
CLAIM 11
. A system for process for removing duplicate data records from a set of data records comprising : a database management system containing a number of data records contained in one or more tables from which one or more data records (output data set) are removed ;
a processor for identifying tokens contained within the data records and classifying the tokens according to attribute field and wherein said processor assigns a similarity score to data records in the reference table in relation to other data records based on a similarity between tokens of said data records ;
and wherein said processor groups together data records whose similarity score with respect to each other is greater than a threshold to form one or more groups of data records that form nodes of a graph wherein edges between nodes represent a similarity score between records of a group ;
and then identifies a canonical record based on the similarity of data records to each other within the group .

US20040249789A1
CLAIM 15
. Apparatus for finding a canonical data record for a set of two or more data records comprising : means for providing a reference table having a number of reference records from which canonical data records are identified ;
means for identifying reference table tokens contained within the reference records of the reference table and classifying the reference table tokens according to attribute field ;
and means for assigning a similarity score to evaluation data (computer system) records in the reference table in relation to other records based on a similarity between tokens of said evaluation data records ;
means for grouping together evaluation records whose similarity score with respect to each other is greater than a threshold to form groups of records that form nodes of a graph wherein edges between nodes represent a similarity between records of a group wherein each said group identifying a canonical record based on the similarity of data records to each other within the group .

US8190610B2
CLAIM 41
. The computer system (evaluation data) of claim 40 , wherein the reducing is accomplished by iterating on at least one key from each of the first and second intermediate data set so that the output data set (more data records) is a merging of a portion of the first and second intermediate data set .
US20040249789A1
CLAIM 11
. A system for process for removing duplicate data records from a set of data records comprising : a database management system containing a number of data records contained in one or more tables from which one or more data records (output data set) are removed ;
a processor for identifying tokens contained within the data records and classifying the tokens according to attribute field and wherein said processor assigns a similarity score to data records in the reference table in relation to other data records based on a similarity between tokens of said data records ;
and wherein said processor groups together data records whose similarity score with respect to each other is greater than a threshold to form one or more groups of data records that form nodes of a graph wherein edges between nodes represent a similarity score between records of a group ;
and then identifies a canonical record based on the similarity of data records to each other within the group .

US20040249789A1
CLAIM 15
. Apparatus for finding a canonical data record for a set of two or more data records comprising : means for providing a reference table having a number of reference records from which canonical data records are identified ;
means for identifying reference table tokens contained within the reference records of the reference table and classifying the reference table tokens according to attribute field ;
and means for assigning a similarity score to evaluation data (computer system) records in the reference table in relation to other records based on a similarity between tokens of said evaluation data records ;
means for grouping together evaluation records whose similarity score with respect to each other is greater than a threshold to form groups of records that form nodes of a graph wherein edges between nodes represent a similarity between records of a group wherein each said group identifying a canonical record based on the similarity of data records to each other within the group .

US8190610B2
CLAIM 42
. The computer system (evaluation data) of claim 40 , wherein the reducing is accomplished by a plurality of reducers , the d the at least one processor and memory further operable for : partitioning the first or second intermediate data set into a plurality of partitions , at least some of the key-value pairs of the first or second intermediate data being provided to more than one of the partitions ;

and providing the partitioned intermediate data set of each partition to a separate one of the reducers .
US20040249789A1
CLAIM 15
. Apparatus for finding a canonical data record for a set of two or more data records comprising : means for providing a reference table having a number of reference records from which canonical data records are identified ;
means for identifying reference table tokens contained within the reference records of the reference table and classifying the reference table tokens according to attribute field ;
and means for assigning a similarity score to evaluation data (computer system) records in the reference table in relation to other records based on a similarity between tokens of said evaluation data records ;
means for grouping together evaluation records whose similarity score with respect to each other is greater than a threshold to form groups of records that form nodes of a graph wherein edges between nodes represent a similarity between records of a group wherein each said group identifying a canonical record based on the similarity of data records to each other within the group .

US8190610B2
CLAIM 43
. The computer system (evaluation data) of claim 42 , wherein all of the key-value pairs of the first and second intermediate data set are provided to all of the reducers .
US20040249789A1
CLAIM 15
. Apparatus for finding a canonical data record for a set of two or more data records comprising : means for providing a reference table having a number of reference records from which canonical data records are identified ;
means for identifying reference table tokens contained within the reference records of the reference table and classifying the reference table tokens according to attribute field ;
and means for assigning a similarity score to evaluation data (computer system) records in the reference table in relation to other records based on a similarity between tokens of said evaluation data records ;
means for grouping together evaluation records whose similarity score with respect to each other is greater than a threshold to form groups of records that form nodes of a graph wherein edges between nodes represent a similarity between records of a group wherein each said group identifying a canonical record based on the similarity of data records to each other within the group .

US8190610B2
CLAIM 44
. The computer system (evaluation data) of claim 42 , wherein at least some of the reducers include a sort , group-by-key and combine task , the d the at least one processor and memory are further operable for : generating and providing metadata for at least some of the mapping , partitioning , combining , grouping and sorting .
US20040249789A1
CLAIM 15
. Apparatus for finding a canonical data record for a set of two or more data records comprising : means for providing a reference table having a number of reference records from which canonical data records are identified ;
means for identifying reference table tokens contained within the reference records of the reference table and classifying the reference table tokens according to attribute field ;
and means for assigning a similarity score to evaluation data (computer system) records in the reference table in relation to other records based on a similarity between tokens of said evaluation data records ;
means for grouping together evaluation records whose similarity score with respect to each other is greater than a threshold to form groups of records that form nodes of a graph wherein edges between nodes represent a similarity between records of a group wherein each said group identifying a canonical record based on the similarity of data records to each other within the group .

US8190610B2
CLAIM 45
. The computer system (evaluation data) of claim 44 , wherein the reducing includes processing the metadata .
US20040249789A1
CLAIM 15
. Apparatus for finding a canonical data record for a set of two or more data records comprising : means for providing a reference table having a number of reference records from which canonical data records are identified ;
means for identifying reference table tokens contained within the reference records of the reference table and classifying the reference table tokens according to attribute field ;
and means for assigning a similarity score to evaluation data (computer system) records in the reference table in relation to other records based on a similarity between tokens of said evaluation data records ;
means for grouping together evaluation records whose similarity score with respect to each other is greater than a threshold to form groups of records that form nodes of a graph wherein edges between nodes represent a similarity between records of a group wherein each said group identifying a canonical record based on the similarity of data records to each other within the group .

US8190610B2
CLAIM 46
. The computer system (evaluation data) of claim 45 , wherein iterating includes providing the associated metadata to the processing of the reducing step .
US20040249789A1
CLAIM 15
. Apparatus for finding a canonical data record for a set of two or more data records comprising : means for providing a reference table having a number of reference records from which canonical data records are identified ;
means for identifying reference table tokens contained within the reference records of the reference table and classifying the reference table tokens according to attribute field ;
and means for assigning a similarity score to evaluation data (computer system) records in the reference table in relation to other records based on a similarity between tokens of said evaluation data records ;
means for grouping together evaluation records whose similarity score with respect to each other is greater than a threshold to form groups of records that form nodes of a graph wherein edges between nodes represent a similarity between records of a group wherein each said group identifying a canonical record based on the similarity of data records to each other within the group .




US8190610B2

Filed: 2006-10-05     Issued: 2012-05-29

MapReduce for distributed database processing

(Original Assignee) Yahoo Inc     (Current Assignee) R2 Solutions LLC ; Altaba Inc

Ali Dasdan, Hung-Chih Yang, Ruey-Lung Hsiao
US20030208468A1

Filed: 2003-04-15     Issued: 2003-11-06

Method, system and apparatus for measuring and analyzing customer business volume

(Original Assignee) EXCHANGE SYNERGISM Ltd     (Current Assignee) Objective Business Services Inc

David McNab, Hugh Oddie
US8190610B2
CLAIM 14
. The method of claim 13 , wherein : the reducing step is carried out by a plurality of reducers ;

and the data that is not intermediate data , for a particular reducer , includes data (includes data) that is associated with another reducer .
US20030208468A1
CLAIM 2
) The method claimed in claim 1 , whereby said business volume data includes data (includes data) relating to data flows into , out of , and among account , product or customer classifications , within a business .

US8190610B2
CLAIM 15
. The method of claim 13 , wherein : the reducing step is carried out by a plurality of reducers ;

and the data that is not intermediate data , for a particular reducer , includes data (includes data) that is associated with that reducer .
US20030208468A1
CLAIM 2
) The method claimed in claim 1 , whereby said business volume data includes data (includes data) relating to data flows into , out of , and among account , product or customer classifications , within a business .

US8190610B2
CLAIM 17
. A computer system (computer system) including a plurality of computing devices , the computer system configured to process data of a data set , wherein the data set comprises a plurality of data groups , the computer system comprises at least one processor and memory that are operable to perform the following operations : partitioning the data of each one of the data groups into a plurality of data partitions that each have a plurality of key-value pairs and providing each data partition to a selected one of a plurality of mapping functions that are each user-configurable to independently output a plurality of lists of values for each of a set of keys found in such map function' ;

s corresponding data partition to form corresponding intermediate data for that data group and identifiable to that data group , wherein the data of a first data group has a different schema than the data of a second data group and the data of the first data group is mapped differently than the data of the second data group so that different lists of values are output for the corresponding different intermediate data , wherein the different schema and corresponding different intermediate data have a key in common ;

and reduce the intermediate data for the data groups to at least one output data group , including processing the intermediate data for each data group in a manner that is defined to correspond to that data group so as to result in a merging of the corresponding different intermediate data based on the key in common .
US20030208468A1
CLAIM 17
) A computer program product for use on a computer system (computer system) to facilitate analyzing and measuring business volume in operation of a database and a database management utility linked to the database , the computer program product comprising : (a) a recording medium ;
and (b) means recorded on the recording medium for instructing the computer system to perform the steps of : (i) Providing to the database business data , including data related to accounts , hierarchical account/product relationships , and hierarchical account/customer relationships , said business data being organized in the database , said database and database management utility supporting relational data queries to said business data ;
(ii) Defining threshold change values regarding said business data , and associating said threshold change values to the database ;
(iii) Performing a plurality of threshold tests for detecting changes throughout a selected set of said business data in accordance with said threshold change values within a specified period of time ;
and (iv) Combining the results of the threshold tests to obtain a classification of data that produces business volume data .

US8190610B2
CLAIM 18
. The computer system (computer system) of claim 17 , wherein : the at least one output data group is a plurality of output data groups .
US20030208468A1
CLAIM 17
) A computer program product for use on a computer system (computer system) to facilitate analyzing and measuring business volume in operation of a database and a database management utility linked to the database , the computer program product comprising : (a) a recording medium ;
and (b) means recorded on the recording medium for instructing the computer system to perform the steps of : (i) Providing to the database business data , including data related to accounts , hierarchical account/product relationships , and hierarchical account/customer relationships , said business data being organized in the database , said database and database management utility supporting relational data queries to said business data ;
(ii) Defining threshold change values regarding said business data , and associating said threshold change values to the database ;
(iii) Performing a plurality of threshold tests for detecting changes throughout a selected set of said business data in accordance with said threshold change values within a specified period of time ;
and (iv) Combining the results of the threshold tests to obtain a classification of data that produces business volume data .

US8190610B2
CLAIM 19
. The computer system (computer system) of claim 17 , wherein : corresponding intermediate data for a data group being identifiable to the data group includes the corresponding intermediate data being stored such that metadata associated with the manner of storage is identifiable to the data group .
US20030208468A1
CLAIM 17
) A computer program product for use on a computer system (computer system) to facilitate analyzing and measuring business volume in operation of a database and a database management utility linked to the database , the computer program product comprising : (a) a recording medium ;
and (b) means recorded on the recording medium for instructing the computer system to perform the steps of : (i) Providing to the database business data , including data related to accounts , hierarchical account/product relationships , and hierarchical account/customer relationships , said business data being organized in the database , said database and database management utility supporting relational data queries to said business data ;
(ii) Defining threshold change values regarding said business data , and associating said threshold change values to the database ;
(iii) Performing a plurality of threshold tests for detecting changes throughout a selected set of said business data in accordance with said threshold change values within a specified period of time ;
and (iv) Combining the results of the threshold tests to obtain a classification of data that produces business volume data .

US8190610B2
CLAIM 20
. The computer system (computer system) of claim 19 , wherein : corresponding intermediate data for a data group being identifiable to the data group includes the computer system being configured to store the corresponding intermediate data in association with data that is identifiable to the data group .
US20030208468A1
CLAIM 17
) A computer program product for use on a computer system (computer system) to facilitate analyzing and measuring business volume in operation of a database and a database management utility linked to the database , the computer program product comprising : (a) a recording medium ;
and (b) means recorded on the recording medium for instructing the computer system to perform the steps of : (i) Providing to the database business data , including data related to accounts , hierarchical account/product relationships , and hierarchical account/customer relationships , said business data being organized in the database , said database and database management utility supporting relational data queries to said business data ;
(ii) Defining threshold change values regarding said business data , and associating said threshold change values to the database ;
(iii) Performing a plurality of threshold tests for detecting changes throughout a selected set of said business data in accordance with said threshold change values within a specified period of time ;
and (iv) Combining the results of the threshold tests to obtain a classification of data that produces business volume data .

US8190610B2
CLAIM 21
. The computer system (computer system) of claim 17 , wherein : the at least one processor and memory being configured to process the intermediate data for each data group in a manner that is defined to correspond to that data group includes , for each data group , the at least one processor and memory being configured to employ an iterator that corresponds to that data group .
US20030208468A1
CLAIM 17
) A computer program product for use on a computer system (computer system) to facilitate analyzing and measuring business volume in operation of a database and a database management utility linked to the database , the computer program product comprising : (a) a recording medium ;
and (b) means recorded on the recording medium for instructing the computer system to perform the steps of : (i) Providing to the database business data , including data related to accounts , hierarchical account/product relationships , and hierarchical account/customer relationships , said business data being organized in the database , said database and database management utility supporting relational data queries to said business data ;
(ii) Defining threshold change values regarding said business data , and associating said threshold change values to the database ;
(iii) Performing a plurality of threshold tests for detecting changes throughout a selected set of said business data in accordance with said threshold change values within a specified period of time ;
and (iv) Combining the results of the threshold tests to obtain a classification of data that produces business volume data .

US8190610B2
CLAIM 22
. The computer system (computer system) of claim 21 , wherein : the intermediate data includes a plurality of grouped sets of key/value pairs ;

the at least one processor and memory are further operable to reduce the intermediate data via a plurality of reducers ;

for at least one of the reducers , the iterator corresponding to a particular data group , for that reducer , is configured to operate according to a different key of a different schema than the iterator corresponding to another particular data group , for that reducer .
US20030208468A1
CLAIM 17
) A computer program product for use on a computer system (computer system) to facilitate analyzing and measuring business volume in operation of a database and a database management utility linked to the database , the computer program product comprising : (a) a recording medium ;
and (b) means recorded on the recording medium for instructing the computer system to perform the steps of : (i) Providing to the database business data , including data related to accounts , hierarchical account/product relationships , and hierarchical account/customer relationships , said business data being organized in the database , said database and database management utility supporting relational data queries to said business data ;
(ii) Defining threshold change values regarding said business data , and associating said threshold change values to the database ;
(iii) Performing a plurality of threshold tests for detecting changes throughout a selected set of said business data in accordance with said threshold change values within a specified period of time ;
and (iv) Combining the results of the threshold tests to obtain a classification of data that produces business volume data .

US8190610B2
CLAIM 23
. The computer system (computer system) of claim 17 , wherein : the intermediate data includes a plurality of grouped sets of key/value pairs ;

the at least one processor and memory are further operable to reduce the intermediate data via a plurality of reducers ;

and the at least one processor and memory are further operable to : partition the intermediate data into a plurality of partitions , to provide each key/value pair of the intermediate data to a separate one of the partitions ;

and provide the intermediate data of each partition to a separate one of the reducers .
US20030208468A1
CLAIM 17
) A computer program product for use on a computer system (computer system) to facilitate analyzing and measuring business volume in operation of a database and a database management utility linked to the database , the computer program product comprising : (a) a recording medium ;
and (b) means recorded on the recording medium for instructing the computer system to perform the steps of : (i) Providing to the database business data , including data related to accounts , hierarchical account/product relationships , and hierarchical account/customer relationships , said business data being organized in the database , said database and database management utility supporting relational data queries to said business data ;
(ii) Defining threshold change values regarding said business data , and associating said threshold change values to the database ;
(iii) Performing a plurality of threshold tests for detecting changes throughout a selected set of said business data in accordance with said threshold change values within a specified period of time ;
and (iv) Combining the results of the threshold tests to obtain a classification of data that produces business volume data .

US8190610B2
CLAIM 24
. The computer system (computer system) of claim 17 , wherein : the intermediate data includes a plurality of grouped sets of key/value pairs ;

the at least one processor and memory are further operable to reduce the intermediate data via a plurality of reducers ;

and the at least one processor and memory are further operable to : partition the intermediate data into a plurality of partitions , to provide at least some of the key/value pairs of the intermediate data to more than one of the partitions ;

and provide the intermediate data of each partition to a separate one of the reducers .
US20030208468A1
CLAIM 17
) A computer program product for use on a computer system (computer system) to facilitate analyzing and measuring business volume in operation of a database and a database management utility linked to the database , the computer program product comprising : (a) a recording medium ;
and (b) means recorded on the recording medium for instructing the computer system to perform the steps of : (i) Providing to the database business data , including data related to accounts , hierarchical account/product relationships , and hierarchical account/customer relationships , said business data being organized in the database , said database and database management utility supporting relational data queries to said business data ;
(ii) Defining threshold change values regarding said business data , and associating said threshold change values to the database ;
(iii) Performing a plurality of threshold tests for detecting changes throughout a selected set of said business data in accordance with said threshold change values within a specified period of time ;
and (iv) Combining the results of the threshold tests to obtain a classification of data that produces business volume data .

US8190610B2
CLAIM 25
. The computer system (computer system) of claim 24 , wherein : the at least one processor and memory are further operable to , in partitioning , to provide all of the key/value pairs of the intermediate data to all of the partitions .
US20030208468A1
CLAIM 17
) A computer program product for use on a computer system (computer system) to facilitate analyzing and measuring business volume in operation of a database and a database management utility linked to the database , the computer program product comprising : (a) a recording medium ;
and (b) means recorded on the recording medium for instructing the computer system to perform the steps of : (i) Providing to the database business data , including data related to accounts , hierarchical account/product relationships , and hierarchical account/customer relationships , said business data being organized in the database , said database and database management utility supporting relational data queries to said business data ;
(ii) Defining threshold change values regarding said business data , and associating said threshold change values to the database ;
(iii) Performing a plurality of threshold tests for detecting changes throughout a selected set of said business data in accordance with said threshold change values within a specified period of time ;
and (iv) Combining the results of the threshold tests to obtain a classification of data that produces business volume data .

US8190610B2
CLAIM 26
. The computer system (computer system) of claim 24 , wherein : at least some of the reducers include a sort , group-by-key and combine task ;

the at least one processor and memory are further operable to generate and provide metadata for at least some of the mapping , partitioning , combining , grouping and sorting .
US20030208468A1
CLAIM 17
) A computer program product for use on a computer system (computer system) to facilitate analyzing and measuring business volume in operation of a database and a database management utility linked to the database , the computer program product comprising : (a) a recording medium ;
and (b) means recorded on the recording medium for instructing the computer system to perform the steps of : (i) Providing to the database business data , including data related to accounts , hierarchical account/product relationships , and hierarchical account/customer relationships , said business data being organized in the database , said database and database management utility supporting relational data queries to said business data ;
(ii) Defining threshold change values regarding said business data , and associating said threshold change values to the database ;
(iii) Performing a plurality of threshold tests for detecting changes throughout a selected set of said business data in accordance with said threshold change values within a specified period of time ;
and (iv) Combining the results of the threshold tests to obtain a classification of data that produces business volume data .

US8190610B2
CLAIM 27
. The computer system (computer system) of claim 26 , wherein : the reducing includes processing the metadata .
US20030208468A1
CLAIM 17
) A computer program product for use on a computer system (computer system) to facilitate analyzing and measuring business volume in operation of a database and a database management utility linked to the database , the computer program product comprising : (a) a recording medium ;
and (b) means recorded on the recording medium for instructing the computer system to perform the steps of : (i) Providing to the database business data , including data related to accounts , hierarchical account/product relationships , and hierarchical account/customer relationships , said business data being organized in the database , said database and database management utility supporting relational data queries to said business data ;
(ii) Defining threshold change values regarding said business data , and associating said threshold change values to the database ;
(iii) Performing a plurality of threshold tests for detecting changes throughout a selected set of said business data in accordance with said threshold change values within a specified period of time ;
and (iv) Combining the results of the threshold tests to obtain a classification of data that produces business volume data .

US8190610B2
CLAIM 28
. The computer system (computer system) of claim 27 , wherein : the at least one processor and memory being further operable to process the intermediate data for each data group in a manner that is defined to correspond to that data group includes , for each data group , the at least one processor and memory being further operable to employ an iterator that corresponds to that data group , wherein the iterator is configured to provide the associated metadata to the reducing .
US20030208468A1
CLAIM 17
) A computer program product for use on a computer system (computer system) to facilitate analyzing and measuring business volume in operation of a database and a database management utility linked to the database , the computer program product comprising : (a) a recording medium ;
and (b) means recorded on the recording medium for instructing the computer system to perform the steps of : (i) Providing to the database business data , including data related to accounts , hierarchical account/product relationships , and hierarchical account/customer relationships , said business data being organized in the database , said database and database management utility supporting relational data queries to said business data ;
(ii) Defining threshold change values regarding said business data , and associating said threshold change values to the database ;
(iii) Performing a plurality of threshold tests for detecting changes throughout a selected set of said business data in accordance with said threshold change values within a specified period of time ;
and (iv) Combining the results of the threshold tests to obtain a classification of data that produces business volume data .

US8190610B2
CLAIM 29
. The computer system (computer system) of claim 17 , wherein : the intermediate data processing of the reducing further comprises processing data that is not intermediate data .
US20030208468A1
CLAIM 17
) A computer program product for use on a computer system (computer system) to facilitate analyzing and measuring business volume in operation of a database and a database management utility linked to the database , the computer program product comprising : (a) a recording medium ;
and (b) means recorded on the recording medium for instructing the computer system to perform the steps of : (i) Providing to the database business data , including data related to accounts , hierarchical account/product relationships , and hierarchical account/customer relationships , said business data being organized in the database , said database and database management utility supporting relational data queries to said business data ;
(ii) Defining threshold change values regarding said business data , and associating said threshold change values to the database ;
(iii) Performing a plurality of threshold tests for detecting changes throughout a selected set of said business data in accordance with said threshold change values within a specified period of time ;
and (iv) Combining the results of the threshold tests to obtain a classification of data that produces business volume data .

US8190610B2
CLAIM 30
. The computer system (computer system) of claim 29 , wherein : the at least one processor and memory are further operable to reduce the intermediate data via a plurality of reducers ;

and the data that is not intermediate data , for a particular reducer , includes data (includes data) that is associated with another reducer .
US20030208468A1
CLAIM 2
) The method claimed in claim 1 , whereby said business volume data includes data (includes data) relating to data flows into , out of , and among account , product or customer classifications , within a business .

US20030208468A1
CLAIM 17
) A computer program product for use on a computer system (computer system) to facilitate analyzing and measuring business volume in operation of a database and a database management utility linked to the database , the computer program product comprising : (a) a recording medium ;
and (b) means recorded on the recording medium for instructing the computer system to perform the steps of : (i) Providing to the database business data , including data related to accounts , hierarchical account/product relationships , and hierarchical account/customer relationships , said business data being organized in the database , said database and database management utility supporting relational data queries to said business data ;
(ii) Defining threshold change values regarding said business data , and associating said threshold change values to the database ;
(iii) Performing a plurality of threshold tests for detecting changes throughout a selected set of said business data in accordance with said threshold change values within a specified period of time ;
and (iv) Combining the results of the threshold tests to obtain a classification of data that produces business volume data .

US8190610B2
CLAIM 31
. The computer system (computer system) of claim 29 , wherein : the at least one processor and memory are further operable to reduce the intermediate data via a plurality of reducers ;

and the data that is not intermediate data , for a particular reducer , includes data (includes data) that is associated with that reducer .
US20030208468A1
CLAIM 2
) The method claimed in claim 1 , whereby said business volume data includes data (includes data) relating to data flows into , out of , and among account , product or customer classifications , within a business .

US20030208468A1
CLAIM 17
) A computer program product for use on a computer system (computer system) to facilitate analyzing and measuring business volume in operation of a database and a database management utility linked to the database , the computer program product comprising : (a) a recording medium ;
and (b) means recorded on the recording medium for instructing the computer system to perform the steps of : (i) Providing to the database business data , including data related to accounts , hierarchical account/product relationships , and hierarchical account/customer relationships , said business data being organized in the database , said database and database management utility supporting relational data queries to said business data ;
(ii) Defining threshold change values regarding said business data , and associating said threshold change values to the database ;
(iii) Performing a plurality of threshold tests for detecting changes throughout a selected set of said business data in accordance with said threshold change values within a specified period of time ;
and (iv) Combining the results of the threshold tests to obtain a classification of data that produces business volume data .

US8190610B2
CLAIM 32
. The computer system (computer system) of claim 17 , wherein the at least one processor and memory are further operable to reduce by relating the data among the plurality of data groups .
US20030208468A1
CLAIM 17
) A computer program product for use on a computer system (computer system) to facilitate analyzing and measuring business volume in operation of a database and a database management utility linked to the database , the computer program product comprising : (a) a recording medium ;
and (b) means recorded on the recording medium for instructing the computer system to perform the steps of : (i) Providing to the database business data , including data related to accounts , hierarchical account/product relationships , and hierarchical account/customer relationships , said business data being organized in the database , said database and database management utility supporting relational data queries to said business data ;
(ii) Defining threshold change values regarding said business data , and associating said threshold change values to the database ;
(iii) Performing a plurality of threshold tests for detecting changes throughout a selected set of said business data in accordance with said threshold change values within a specified period of time ;
and (iv) Combining the results of the threshold tests to obtain a classification of data that produces business volume data .

US8190610B2
CLAIM 33
. A map-reduce method of processing data from a plurality of groups having different schema over a computer system (computer system) , the method comprising : for a first data set having a plurality of first key-value pairs , wherein such first data set belongs to a first data group and the first key-value pairs have a first schema , partitioning the first data set into a plurality of data partitions and providing each data partition to a selected one of a plurality of mapping functions that are each user-configurable to independently output a plurality of lists of values for each of a set of keys found in such map function' ;

s corresponding data partition to form a first intermediate data set having a first set of resulting key-value pairs ;

for a second data set having a plurality of second key-value pairs , wherein such second data set belongs to a second data group and the second key-value pairs have a second schema , partitioning the first data set into a plurality of data partitions and providing each data partition to a selected one of a plurality of mapping functions to form a second intermediate data set having a second set of resulting key-value pairs differing from the first set of resulting key-value pairs , wherein the first schema differs from the second schema , wherein the first and second schema and the first and second set of resulting key-value pairs have a key in common ;

and reducing the first and the second intermediate data set together so as to form an output data set (n value) , wherein the reducing is accomplished by iterating on at least one key from the first and second intermediate data set , and the output data set has a different schema than the first and second schema , wherein the mapping and reducing operations are performed by a distributed system .
US20030208468A1
CLAIM 5
) The method claimed in claim 4 , whereby the threshold tests consist of : (a) establishing one or more benchmark values of said business data ;
(b) detecting changes to said business data to establish one or more comparison value (output data set) s of said business data ;
(c) comparing said one or more benchmark values to said one or more comparison values to detect one or more changes in business volume ;
(d) establishing database flags in said database corresponding with said one or more changes in business volume .

US20030208468A1
CLAIM 17
) A computer program product for use on a computer system (computer system) to facilitate analyzing and measuring business volume in operation of a database and a database management utility linked to the database , the computer program product comprising : (a) a recording medium ;
and (b) means recorded on the recording medium for instructing the computer system to perform the steps of : (i) Providing to the database business data , including data related to accounts , hierarchical account/product relationships , and hierarchical account/customer relationships , said business data being organized in the database , said database and database management utility supporting relational data queries to said business data ;
(ii) Defining threshold change values regarding said business data , and associating said threshold change values to the database ;
(iii) Performing a plurality of threshold tests for detecting changes throughout a selected set of said business data in accordance with said threshold change values within a specified period of time ;
and (iv) Combining the results of the threshold tests to obtain a classification of data that produces business volume data .

US8190610B2
CLAIM 34
. The map-reduce method of claim 33 , wherein the reducing is accomplished by iterating on at least one key from each of the first and second intermediate data set so that the output data set (n value) is a merging of a portion of the first and second intermediate data set .
US20030208468A1
CLAIM 5
) The method claimed in claim 4 , whereby the threshold tests consist of : (a) establishing one or more benchmark values of said business data ;
(b) detecting changes to said business data to establish one or more comparison value (output data set) s of said business data ;
(c) comparing said one or more benchmark values to said one or more comparison values to detect one or more changes in business volume ;
(d) establishing database flags in said database corresponding with said one or more changes in business volume .

US8190610B2
CLAIM 40
. A computer system (computer system) including a plurality of computing devices , the computer system configured to process data of a data set , wherein the data set comprises a plurality of data groups , the computer system comprises at least one processor and memory that are operable to perform the following operations : for a first data set having a plurality of first key-value pairs , wherein such first data set belongs to a first data group and the first key-value pairs have a first schema , partitioning the first data set into a plurality of data partitions and providing each data partition to a selected one of a plurality of mapping functions that are each user-configurable to independently output a plurality of lists of values for each of a set of keys found in such map function' ;

s corresponding data partition to form a first intermediate data set having a first set of resulting key-value pairs ;

for a second data set having a plurality of second key-value pairs , wherein such second data set belongs to a second data group and the second key-value pairs have a second schema , partitioning the first data set into a plurality of data partitions and providing each data partition to a selected one of a plurality of mapping functions to form a second intermediate data set having a second set of resulting key-value pairs differing from the first set of resulting key-value pairs , wherein the first schema differs from the second schema ;

and reducing the first and the second intermediate data set together so as to form an output data set (n value) , wherein the reducing is accomplished by iterating on at least one key from the first and second intermediate data set , and the output data set has a different schema than the first and second schema , wherein the first and second schema and the first and second set of resulting key-value pairs have a key in common .
US20030208468A1
CLAIM 5
) The method claimed in claim 4 , whereby the threshold tests consist of : (a) establishing one or more benchmark values of said business data ;
(b) detecting changes to said business data to establish one or more comparison value (output data set) s of said business data ;
(c) comparing said one or more benchmark values to said one or more comparison values to detect one or more changes in business volume ;
(d) establishing database flags in said database corresponding with said one or more changes in business volume .

US20030208468A1
CLAIM 17
) A computer program product for use on a computer system (computer system) to facilitate analyzing and measuring business volume in operation of a database and a database management utility linked to the database , the computer program product comprising : (a) a recording medium ;
and (b) means recorded on the recording medium for instructing the computer system to perform the steps of : (i) Providing to the database business data , including data related to accounts , hierarchical account/product relationships , and hierarchical account/customer relationships , said business data being organized in the database , said database and database management utility supporting relational data queries to said business data ;
(ii) Defining threshold change values regarding said business data , and associating said threshold change values to the database ;
(iii) Performing a plurality of threshold tests for detecting changes throughout a selected set of said business data in accordance with said threshold change values within a specified period of time ;
and (iv) Combining the results of the threshold tests to obtain a classification of data that produces business volume data .

US8190610B2
CLAIM 41
. The computer system (computer system) of claim 40 , wherein the reducing is accomplished by iterating on at least one key from each of the first and second intermediate data set so that the output data set (n value) is a merging of a portion of the first and second intermediate data set .
US20030208468A1
CLAIM 5
) The method claimed in claim 4 , whereby the threshold tests consist of : (a) establishing one or more benchmark values of said business data ;
(b) detecting changes to said business data to establish one or more comparison value (output data set) s of said business data ;
(c) comparing said one or more benchmark values to said one or more comparison values to detect one or more changes in business volume ;
(d) establishing database flags in said database corresponding with said one or more changes in business volume .

US20030208468A1
CLAIM 17
) A computer program product for use on a computer system (computer system) to facilitate analyzing and measuring business volume in operation of a database and a database management utility linked to the database , the computer program product comprising : (a) a recording medium ;
and (b) means recorded on the recording medium for instructing the computer system to perform the steps of : (i) Providing to the database business data , including data related to accounts , hierarchical account/product relationships , and hierarchical account/customer relationships , said business data being organized in the database , said database and database management utility supporting relational data queries to said business data ;
(ii) Defining threshold change values regarding said business data , and associating said threshold change values to the database ;
(iii) Performing a plurality of threshold tests for detecting changes throughout a selected set of said business data in accordance with said threshold change values within a specified period of time ;
and (iv) Combining the results of the threshold tests to obtain a classification of data that produces business volume data .

US8190610B2
CLAIM 42
. The computer system (computer system) of claim 40 , wherein the reducing is accomplished by a plurality of reducers , the d the at least one processor and memory further operable for : partitioning the first or second intermediate data set into a plurality of partitions , at least some of the key-value pairs of the first or second intermediate data being provided to more than one of the partitions ;

and providing the partitioned intermediate data set of each partition to a separate one of the reducers .
US20030208468A1
CLAIM 17
) A computer program product for use on a computer system (computer system) to facilitate analyzing and measuring business volume in operation of a database and a database management utility linked to the database , the computer program product comprising : (a) a recording medium ;
and (b) means recorded on the recording medium for instructing the computer system to perform the steps of : (i) Providing to the database business data , including data related to accounts , hierarchical account/product relationships , and hierarchical account/customer relationships , said business data being organized in the database , said database and database management utility supporting relational data queries to said business data ;
(ii) Defining threshold change values regarding said business data , and associating said threshold change values to the database ;
(iii) Performing a plurality of threshold tests for detecting changes throughout a selected set of said business data in accordance with said threshold change values within a specified period of time ;
and (iv) Combining the results of the threshold tests to obtain a classification of data that produces business volume data .

US8190610B2
CLAIM 43
. The computer system (computer system) of claim 42 , wherein all of the key-value pairs of the first and second intermediate data set are provided to all of the reducers .
US20030208468A1
CLAIM 17
) A computer program product for use on a computer system (computer system) to facilitate analyzing and measuring business volume in operation of a database and a database management utility linked to the database , the computer program product comprising : (a) a recording medium ;
and (b) means recorded on the recording medium for instructing the computer system to perform the steps of : (i) Providing to the database business data , including data related to accounts , hierarchical account/product relationships , and hierarchical account/customer relationships , said business data being organized in the database , said database and database management utility supporting relational data queries to said business data ;
(ii) Defining threshold change values regarding said business data , and associating said threshold change values to the database ;
(iii) Performing a plurality of threshold tests for detecting changes throughout a selected set of said business data in accordance with said threshold change values within a specified period of time ;
and (iv) Combining the results of the threshold tests to obtain a classification of data that produces business volume data .

US8190610B2
CLAIM 44
. The computer system (computer system) of claim 42 , wherein at least some of the reducers include a sort , group-by-key and combine task , the d the at least one processor and memory are further operable for : generating and providing metadata for at least some of the mapping , partitioning , combining , grouping and sorting .
US20030208468A1
CLAIM 17
) A computer program product for use on a computer system (computer system) to facilitate analyzing and measuring business volume in operation of a database and a database management utility linked to the database , the computer program product comprising : (a) a recording medium ;
and (b) means recorded on the recording medium for instructing the computer system to perform the steps of : (i) Providing to the database business data , including data related to accounts , hierarchical account/product relationships , and hierarchical account/customer relationships , said business data being organized in the database , said database and database management utility supporting relational data queries to said business data ;
(ii) Defining threshold change values regarding said business data , and associating said threshold change values to the database ;
(iii) Performing a plurality of threshold tests for detecting changes throughout a selected set of said business data in accordance with said threshold change values within a specified period of time ;
and (iv) Combining the results of the threshold tests to obtain a classification of data that produces business volume data .

US8190610B2
CLAIM 45
. The computer system (computer system) of claim 44 , wherein the reducing includes processing the metadata .
US20030208468A1
CLAIM 17
) A computer program product for use on a computer system (computer system) to facilitate analyzing and measuring business volume in operation of a database and a database management utility linked to the database , the computer program product comprising : (a) a recording medium ;
and (b) means recorded on the recording medium for instructing the computer system to perform the steps of : (i) Providing to the database business data , including data related to accounts , hierarchical account/product relationships , and hierarchical account/customer relationships , said business data being organized in the database , said database and database management utility supporting relational data queries to said business data ;
(ii) Defining threshold change values regarding said business data , and associating said threshold change values to the database ;
(iii) Performing a plurality of threshold tests for detecting changes throughout a selected set of said business data in accordance with said threshold change values within a specified period of time ;
and (iv) Combining the results of the threshold tests to obtain a classification of data that produces business volume data .

US8190610B2
CLAIM 46
. The computer system (computer system) of claim 45 , wherein iterating includes providing the associated metadata to the processing of the reducing step .
US20030208468A1
CLAIM 17
) A computer program product for use on a computer system (computer system) to facilitate analyzing and measuring business volume in operation of a database and a database management utility linked to the database , the computer program product comprising : (a) a recording medium ;
and (b) means recorded on the recording medium for instructing the computer system to perform the steps of : (i) Providing to the database business data , including data related to accounts , hierarchical account/product relationships , and hierarchical account/customer relationships , said business data being organized in the database , said database and database management utility supporting relational data queries to said business data ;
(ii) Defining threshold change values regarding said business data , and associating said threshold change values to the database ;
(iii) Performing a plurality of threshold tests for detecting changes throughout a selected set of said business data in accordance with said threshold change values within a specified period of time ;
and (iv) Combining the results of the threshold tests to obtain a classification of data that produces business volume data .




US8190610B2

Filed: 2006-10-05     Issued: 2012-05-29

MapReduce for distributed database processing

(Original Assignee) Yahoo Inc     (Current Assignee) R2 Solutions LLC ; Altaba Inc

Ali Dasdan, Hung-Chih Yang, Ruey-Lung Hsiao
US20040199533A1

Filed: 2003-04-01     Issued: 2004-10-07

Associative hash partitioning

(Original Assignee) Microsoft Corp     (Current Assignee) Microsoft Technology Licensing LLC

Pedro Celis, Lubor Kollar, Shailesh Vaishnavi
US8190610B2
CLAIM 1
. A method of processing data of a data set (said system) over a distributed system , wherein the data set comprises a plurality of data groups , the method comprising : partitioning the data of each one of the data groups into a plurality of data partitions that each have a plurality of key-value pairs and providing each data partition to a selected one of a plurality of mapping functions that are each user-configurable to independently output a plurality of lists of values for each of a set of keys found in such map function' ;

s corresponding data partition to form corresponding intermediate data for that data group and identifiable to that data group , wherein the data of a first data group has a different schema than the data of a second data group and the data of the first data group is mapped differently than the data of the second data group so that different lists of values are output for the corresponding different intermediate data , wherein the different schema and corresponding different intermediate data have a key in common ;

and reducing the intermediate data for the data groups to at least one output data group , including processing the intermediate data for each data group in a manner that is defined to correspond to that data group , so as to result in a merging of the corresponding different intermediate data based on the key in common , wherein the mapping and reducing operations are performed by a distributed system .
US20040199533A1
CLAIM 20
. A system for determining a partition from among N partitions associated with a table entry , where N is an integer greater than 0 , said table entry comprising a key , said system (data set, first data set, second data set) comprising : a pseudo-random number generator for generating a sequence of N pseudo-random numbers using said key as a seed ;
and a number position determination module for determining a position in said sequence of a number in said sequence with a predetermined attribute .

US8190610B2
CLAIM 17
. A computer system including a plurality of computing devices , the computer system configured to process data of a data set (said system) , wherein the data set comprises a plurality of data groups , the computer system comprises at least one processor and memory that are operable to perform the following operations : partitioning the data of each one of the data groups into a plurality of data partitions that each have a plurality of key-value pairs and providing each data partition to a selected one of a plurality of mapping functions that are each user-configurable to independently output a plurality of lists of values for each of a set of keys found in such map function' ;

s corresponding data partition to form corresponding intermediate data for that data group and identifiable to that data group , wherein the data of a first data group has a different schema than the data of a second data group and the data of the first data group is mapped differently than the data of the second data group so that different lists of values are output for the corresponding different intermediate data , wherein the different schema and corresponding different intermediate data have a key in common ;

and reduce the intermediate data for the data groups to at least one output data group , including processing the intermediate data for each data group in a manner that is defined to correspond to that data group so as to result in a merging of the corresponding different intermediate data based on the key in common .
US20040199533A1
CLAIM 20
. A system for determining a partition from among N partitions associated with a table entry , where N is an integer greater than 0 , said table entry comprising a key , said system (data set, first data set, second data set) comprising : a pseudo-random number generator for generating a sequence of N pseudo-random numbers using said key as a seed ;
and a number position determination module for determining a position in said sequence of a number in said sequence with a predetermined attribute .

US8190610B2
CLAIM 33
. A map-reduce method of processing data from a plurality of groups having different schema over a computer system , the method comprising : for a first data set (said system) having a plurality of first key-value pairs , wherein such first data set belongs to a first data group and the first key-value pairs have a first schema , partitioning the first data set into a plurality of data partitions and providing each data partition to a selected one of a plurality of mapping functions that are each user-configurable to independently output a plurality of lists of values for each of a set of keys found in such map function' ;

s corresponding data partition to form a first intermediate data set having a first set of resulting key-value pairs ;

for a second data set having a plurality of second key-value pairs , wherein such second data set belongs to a second data group and the second key-value pairs have a second schema , partitioning the first data set into a plurality of data partitions and providing each data partition to a selected one of a plurality of mapping functions to form a second intermediate data set having a second set (specific attribute) of resulting key-value pairs differing from the first set of resulting key-value pairs , wherein the first schema differs from the second schema , wherein the first and second schema and the first and second set of resulting key-value pairs have a key in common ;

and reducing the first and the second intermediate data set together so as to form an output data set , wherein the reducing is accomplished by iterating on at least one key from the first and second intermediate data set , and the output data set has a different schema than the first and second schema , wherein the mapping and reducing operations are performed by a distributed system .
US20040199533A1
CLAIM 1
. A method for associating a table entry with a partition from among N partitions , where N is an integer , said table entry comprising a key , said method comprising : generating a sequence of N pseudo-random numbers using said key as a seed ;
determining a position of a number with a specific attribute (second set) in said sequence with a pre-determined attribute ;
and associating said table entry with a partition corresponding to said position .

US20040199533A1
CLAIM 20
. A system for determining a partition from among N partitions associated with a table entry , where N is an integer greater than 0 , said table entry comprising a key , said system (data set, first data set, second data set) comprising : a pseudo-random number generator for generating a sequence of N pseudo-random numbers using said key as a seed ;
and a number position determination module for determining a position in said sequence of a number in said sequence with a predetermined attribute .

US8190610B2
CLAIM 34
. The map-reduce method of claim 33 , wherein the reducing is accomplished by iterating on at least one key from each of the first and second intermediate data set (said system) so that the output data set is a merging of a portion of the first and second intermediate data set .
US20040199533A1
CLAIM 20
. A system for determining a partition from among N partitions associated with a table entry , where N is an integer greater than 0 , said table entry comprising a key , said system (data set, first data set, second data set) comprising : a pseudo-random number generator for generating a sequence of N pseudo-random numbers using said key as a seed ;
and a number position determination module for determining a position in said sequence of a number in said sequence with a predetermined attribute .

US8190610B2
CLAIM 35
. The map-reduce method of claim 33 , wherein the reducing is accomplished by a plurality of reducers , the method further comprises : partitioning the first or second intermediate data set (said system) into a plurality of partitions , at least some of the key-value pairs of the first or second intermediate data being provided to more than one of the partitions ;

and providing the partitioned intermediate data set of each partition to a separate one of the reducers .
US20040199533A1
CLAIM 20
. A system for determining a partition from among N partitions associated with a table entry , where N is an integer greater than 0 , said table entry comprising a key , said system (data set, first data set, second data set) comprising : a pseudo-random number generator for generating a sequence of N pseudo-random numbers using said key as a seed ;
and a number position determination module for determining a position in said sequence of a number in said sequence with a predetermined attribute .

US8190610B2
CLAIM 36
. The map-reduce method of claim 35 , wherein all of the key-value pairs of the first and second intermediate data set (said system) are provided to all of the reducers .
US20040199533A1
CLAIM 20
. A system for determining a partition from among N partitions associated with a table entry , where N is an integer greater than 0 , said table entry comprising a key , said system (data set, first data set, second data set) comprising : a pseudo-random number generator for generating a sequence of N pseudo-random numbers using said key as a seed ;
and a number position determination module for determining a position in said sequence of a number in said sequence with a predetermined attribute .

US8190610B2
CLAIM 40
. A computer system including a plurality of computing devices , the computer system configured to process data of a data set (said system) , wherein the data set comprises a plurality of data groups , the computer system comprises at least one processor and memory that are operable to perform the following operations : for a first data set having a plurality of first key-value pairs , wherein such first data set belongs to a first data group and the first key-value pairs have a first schema , partitioning the first data set into a plurality of data partitions and providing each data partition to a selected one of a plurality of mapping functions that are each user-configurable to independently output a plurality of lists of values for each of a set of keys found in such map function' ;

s corresponding data partition to form a first intermediate data set having a first set of resulting key-value pairs ;

for a second data set having a plurality of second key-value pairs , wherein such second data set belongs to a second data group and the second key-value pairs have a second schema , partitioning the first data set into a plurality of data partitions and providing each data partition to a selected one of a plurality of mapping functions to form a second intermediate data set having a second set (specific attribute) of resulting key-value pairs differing from the first set of resulting key-value pairs , wherein the first schema differs from the second schema ;

and reducing the first and the second intermediate data set together so as to form an output data set , wherein the reducing is accomplished by iterating on at least one key from the first and second intermediate data set , and the output data set has a different schema than the first and second schema , wherein the first and second schema and the first and second set of resulting key-value pairs have a key in common .
US20040199533A1
CLAIM 1
. A method for associating a table entry with a partition from among N partitions , where N is an integer , said table entry comprising a key , said method comprising : generating a sequence of N pseudo-random numbers using said key as a seed ;
determining a position of a number with a specific attribute (second set) in said sequence with a pre-determined attribute ;
and associating said table entry with a partition corresponding to said position .

US20040199533A1
CLAIM 20
. A system for determining a partition from among N partitions associated with a table entry , where N is an integer greater than 0 , said table entry comprising a key , said system (data set, first data set, second data set) comprising : a pseudo-random number generator for generating a sequence of N pseudo-random numbers using said key as a seed ;
and a number position determination module for determining a position in said sequence of a number in said sequence with a predetermined attribute .

US8190610B2
CLAIM 41
. The computer system of claim 40 , wherein the reducing is accomplished by iterating on at least one key from each of the first and second intermediate data set (said system) so that the output data set is a merging of a portion of the first and second intermediate data set .
US20040199533A1
CLAIM 20
. A system for determining a partition from among N partitions associated with a table entry , where N is an integer greater than 0 , said table entry comprising a key , said system (data set, first data set, second data set) comprising : a pseudo-random number generator for generating a sequence of N pseudo-random numbers using said key as a seed ;
and a number position determination module for determining a position in said sequence of a number in said sequence with a predetermined attribute .

US8190610B2
CLAIM 42
. The computer system of claim 40 , wherein the reducing is accomplished by a plurality of reducers , the d the at least one processor and memory further operable for : partitioning the first or second intermediate data set (said system) into a plurality of partitions , at least some of the key-value pairs of the first or second intermediate data being provided to more than one of the partitions ;

and providing the partitioned intermediate data set of each partition to a separate one of the reducers .
US20040199533A1
CLAIM 20
. A system for determining a partition from among N partitions associated with a table entry , where N is an integer greater than 0 , said table entry comprising a key , said system (data set, first data set, second data set) comprising : a pseudo-random number generator for generating a sequence of N pseudo-random numbers using said key as a seed ;
and a number position determination module for determining a position in said sequence of a number in said sequence with a predetermined attribute .

US8190610B2
CLAIM 43
. The computer system of claim 42 , wherein all of the key-value pairs of the first and second intermediate data set (said system) are provided to all of the reducers .
US20040199533A1
CLAIM 20
. A system for determining a partition from among N partitions associated with a table entry , where N is an integer greater than 0 , said table entry comprising a key , said system (data set, first data set, second data set) comprising : a pseudo-random number generator for generating a sequence of N pseudo-random numbers using said key as a seed ;
and a number position determination module for determining a position in said sequence of a number in said sequence with a predetermined attribute .




US8190610B2

Filed: 2006-10-05     Issued: 2012-05-29

MapReduce for distributed database processing

(Original Assignee) Yahoo Inc     (Current Assignee) R2 Solutions LLC ; Altaba Inc

Ali Dasdan, Hung-Chih Yang, Ruey-Lung Hsiao
US20030227924A1

Filed: 2003-02-04     Issued: 2003-12-11

Capacity allocation for networks having path length routing constraints

(Original Assignee) Nokia of America Corp     (Current Assignee) Nokia of America Corp

Muralidharan Kodialam, Tirunell Lakshman
US8190610B2
CLAIM 1
. A method of processing data of a data set over a distributed system , wherein the data set comprises a plurality of data groups (shortest path) , the method comprising : partitioning the data of each one of the data groups into a plurality of data partitions that each have a plurality of key-value pairs and providing each data partition to a selected one of a plurality of mapping functions that are each user-configurable to independently output a plurality of lists of values for each of a set of keys found in such map function' ;

s corresponding data partition to form corresponding intermediate data for that data group and identifiable to that data group , wherein the data of a first data (repeating step) group has a different schema than the data of a second data group (repeating step) and the data of the first data group is mapped differently than the data of the second data group so that different lists of values are output for the corresponding different intermediate data , wherein the different schema and corresponding different intermediate data have a key in common ;

and reducing the intermediate data for the data groups to at least one output data group , including processing the intermediate data for each data group in a manner that is defined to correspond to that data group , so as to result in a merging of the corresponding different intermediate data based on the key in common , wherein the mapping and reducing operations are performed by a distributed system .
US20030227924A1
CLAIM 1
. A method of allocating demands of a plurality of connections transferring data through a network of nodes connected by links , the method comprising the steps of : (a) initializing a weight of each link based on a capacity of the link ;
(b) routing a demand for a current connection by the steps of : (b1) generating a minimum path weight for paths through nodes and links of the network for the current connection , wherein the minimum path weight is the least sum of link weights for links of each length-bounded path between a source node and a destination node of the current connection , (b2) determining an optimal length-bounded path through nodes and links of the network for the current connection based on the minimum path weight , (b3) routing a portion of a remainder of the demand as one or more flows over the length-bounded path , wherein the portion is based on a lesser value of the remainder of the demand and a minimum capacity of links in the length-bounded path , (b4) updating i) the link weight of each link based on the routed portion and a capacity of the link and ii) the remainder of the demand , and (b5) repeating step (first data, first data group, second data group) s (b1)-(b5) until the demand is routed ;
(d) repeating step (b) for each of the plurality of connections ;
(e) generating a scaling value based on a maximum ratio of a flow over a link and a capacity of the link .

US20030227924A1
CLAIM 12
. The invention of claim 9 , wherein , for step (b) , the linear programming sizing problem maximizes the scaling factor as the objective function ;
and wherein : the first set of constraints are A) a sum , of all flows on each link is less than the link' ;
s capacity , B) each demand as a function of the scaling factor is routed through the-network , and C) each flow over a link is non-negative ;
the dual minimizes a shortest length-bounded path weight for each of the plurality of connections ;
the second set of constraints are D) a sum of all link weights is less than the minimum shortest path (data groups) weight , E) each demand as a function of the minimum path weight is routed through the network , and F) each link weight is non-negative .

US8190610B2
CLAIM 2
. The method of claim 1 , wherein : the at least one output data group is a plurality of output data groups (shortest path) .
US20030227924A1
CLAIM 12
. The invention of claim 9 , wherein , for step (b) , the linear programming sizing problem maximizes the scaling factor as the objective function ;
and wherein : the first set of constraints are A) a sum , of all flows on each link is less than the link' ;
s capacity , B) each demand as a function of the scaling factor is routed through the-network , and C) each flow over a link is non-negative ;
the dual minimizes a shortest length-bounded path weight for each of the plurality of connections ;
the second set of constraints are D) a sum of all link weights is less than the minimum shortest path (data groups) weight , E) each demand as a function of the minimum path weight is routed through the network , and F) each link weight is non-negative .

US8190610B2
CLAIM 16
. The method of claim 1 wherein the reducing step includes relating the data among the plurality of data groups (shortest path) .
US20030227924A1
CLAIM 12
. The invention of claim 9 , wherein , for step (b) , the linear programming sizing problem maximizes the scaling factor as the objective function ;
and wherein : the first set of constraints are A) a sum , of all flows on each link is less than the link' ;
s capacity , B) each demand as a function of the scaling factor is routed through the-network , and C) each flow over a link is non-negative ;
the dual minimizes a shortest length-bounded path weight for each of the plurality of connections ;
the second set of constraints are D) a sum of all link weights is less than the minimum shortest path (data groups) weight , E) each demand as a function of the minimum path weight is routed through the network , and F) each link weight is non-negative .

US8190610B2
CLAIM 17
. A computer system including a plurality of computing devices , the computer system configured to process data of a data set , wherein the data set comprises a plurality of data groups (shortest path) , the computer system comprises at least one processor and memory that are operable to perform the following operations : partitioning the data of each one of the data groups into a plurality of data partitions that each have a plurality of key-value pairs and providing each data partition to a selected one of a plurality of mapping functions that are each user-configurable to independently output a plurality of lists of values for each of a set of keys found in such map function' ;

s corresponding data partition to form corresponding intermediate data for that data group and identifiable to that data group , wherein the data of a first data (repeating step) group has a different schema than the data of a second data group (repeating step) and the data of the first data group is mapped differently than the data of the second data group so that different lists of values are output for the corresponding different intermediate data , wherein the different schema and corresponding different intermediate data have a key in common ;

and reduce the intermediate data for the data groups to at least one output data group , including processing the intermediate data for each data group in a manner that is defined to correspond to that data group so as to result in a merging of the corresponding different intermediate data based on the key in common .
US20030227924A1
CLAIM 1
. A method of allocating demands of a plurality of connections transferring data through a network of nodes connected by links , the method comprising the steps of : (a) initializing a weight of each link based on a capacity of the link ;
(b) routing a demand for a current connection by the steps of : (b1) generating a minimum path weight for paths through nodes and links of the network for the current connection , wherein the minimum path weight is the least sum of link weights for links of each length-bounded path between a source node and a destination node of the current connection , (b2) determining an optimal length-bounded path through nodes and links of the network for the current connection based on the minimum path weight , (b3) routing a portion of a remainder of the demand as one or more flows over the length-bounded path , wherein the portion is based on a lesser value of the remainder of the demand and a minimum capacity of links in the length-bounded path , (b4) updating i) the link weight of each link based on the routed portion and a capacity of the link and ii) the remainder of the demand , and (b5) repeating step (first data, first data group, second data group) s (b1)-(b5) until the demand is routed ;
(d) repeating step (b) for each of the plurality of connections ;
(e) generating a scaling value based on a maximum ratio of a flow over a link and a capacity of the link .

US20030227924A1
CLAIM 12
. The invention of claim 9 , wherein , for step (b) , the linear programming sizing problem maximizes the scaling factor as the objective function ;
and wherein : the first set of constraints are A) a sum , of all flows on each link is less than the link' ;
s capacity , B) each demand as a function of the scaling factor is routed through the-network , and C) each flow over a link is non-negative ;
the dual minimizes a shortest length-bounded path weight for each of the plurality of connections ;
the second set of constraints are D) a sum of all link weights is less than the minimum shortest path (data groups) weight , E) each demand as a function of the minimum path weight is routed through the network , and F) each link weight is non-negative .

US8190610B2
CLAIM 18
. The computer system of claim 17 , wherein : the at least one output data group is a plurality of output data groups (shortest path) .
US20030227924A1
CLAIM 12
. The invention of claim 9 , wherein , for step (b) , the linear programming sizing problem maximizes the scaling factor as the objective function ;
and wherein : the first set of constraints are A) a sum , of all flows on each link is less than the link' ;
s capacity , B) each demand as a function of the scaling factor is routed through the-network , and C) each flow over a link is non-negative ;
the dual minimizes a shortest length-bounded path weight for each of the plurality of connections ;
the second set of constraints are D) a sum of all link weights is less than the minimum shortest path (data groups) weight , E) each demand as a function of the minimum path weight is routed through the network , and F) each link weight is non-negative .

US8190610B2
CLAIM 32
. The computer system of claim 17 , wherein the at least one processor and memory are further operable to reduce by relating the data among the plurality of data groups (shortest path) .
US20030227924A1
CLAIM 12
. The invention of claim 9 , wherein , for step (b) , the linear programming sizing problem maximizes the scaling factor as the objective function ;
and wherein : the first set of constraints are A) a sum , of all flows on each link is less than the link' ;
s capacity , B) each demand as a function of the scaling factor is routed through the-network , and C) each flow over a link is non-negative ;
the dual minimizes a shortest length-bounded path weight for each of the plurality of connections ;
the second set of constraints are D) a sum of all link weights is less than the minimum shortest path (data groups) weight , E) each demand as a function of the minimum path weight is routed through the network , and F) each link weight is non-negative .

US8190610B2
CLAIM 33
. A map-reduce method of processing data from a plurality of groups having different schema over a computer system , the method comprising : for a first data (repeating step) set having a plurality of first key-value pairs , wherein such first data set belongs to a first data group and the first key-value pairs have a first schema , partitioning the first data set into a plurality of data partitions and providing each data partition to a selected one of a plurality of mapping functions that are each user-configurable to independently output a plurality of lists of values for each of a set of keys found in such map function' ;

s corresponding data partition to form a first intermediate data set having a first set (first set) of resulting key-value pairs ;

for a second data set having a plurality of second key-value pairs , wherein such second data set belongs to a second data group (repeating step) and the second key-value pairs have a second schema , partitioning the first data set into a plurality of data partitions and providing each data partition to a selected one of a plurality of mapping functions to form a second intermediate data set having a second set (second set) of resulting key-value pairs differing from the first set of resulting key-value pairs , wherein the first schema differs from the second schema , wherein the first and second schema and the first and second set of resulting key-value pairs have a key in common ;

and reducing the first and the second intermediate data set together so as to form an output data set , wherein the reducing is accomplished by iterating on at least one key from the first and second intermediate data set , and the output data set has a different schema than the first and second schema , wherein the mapping and reducing operations are performed by a distributed system .
US20030227924A1
CLAIM 1
. A method of allocating demands of a plurality of connections transferring data through a network of nodes connected by links , the method comprising the steps of : (a) initializing a weight of each link based on a capacity of the link ;
(b) routing a demand for a current connection by the steps of : (b1) generating a minimum path weight for paths through nodes and links of the network for the current connection , wherein the minimum path weight is the least sum of link weights for links of each length-bounded path between a source node and a destination node of the current connection , (b2) determining an optimal length-bounded path through nodes and links of the network for the current connection based on the minimum path weight , (b3) routing a portion of a remainder of the demand as one or more flows over the length-bounded path , wherein the portion is based on a lesser value of the remainder of the demand and a minimum capacity of links in the length-bounded path , (b4) updating i) the link weight of each link based on the routed portion and a capacity of the link and ii) the remainder of the demand , and (b5) repeating step (first data, first data group, second data group) s (b1)-(b5) until the demand is routed ;
(d) repeating step (b) for each of the plurality of connections ;
(e) generating a scaling value based on a maximum ratio of a flow over a link and a capacity of the link .

US20030227924A1
CLAIM 9
. A method of allocating link capacity for a plurality of connections transferring data through a network , the method comprising the steps of : (a) generating a graph of the network , wherein the network includes a plurality of nodes interconnected by a plurality of links ;
(b) forming a linear programming problem based on the plurality of connections , wherein i) each connection defines a length-bounded path for a demand and ii) the linear programming problem tends to maximize a first objective function based on a first set (first set) of constraints ;
(c) forming a dual of the linear programming problem , wherein the dual tends to minimize a second objective function based on a second set (second set) of constraints ;
and (d) solving the dual to generate a scaling factor and routing of the length-bounded path for each of the plurality of connections .

US8190610B2
CLAIM 40
. A computer system including a plurality of computing devices , the computer system configured to process data of a data set , wherein the data set comprises a plurality of data groups (shortest path) , the computer system comprises at least one processor and memory that are operable to perform the following operations : for a first data (repeating step) set having a plurality of first key-value pairs , wherein such first data set belongs to a first data group and the first key-value pairs have a first schema , partitioning the first data set into a plurality of data partitions and providing each data partition to a selected one of a plurality of mapping functions that are each user-configurable to independently output a plurality of lists of values for each of a set of keys found in such map function' ;

s corresponding data partition to form a first intermediate data set having a first set (first set) of resulting key-value pairs ;

for a second data set having a plurality of second key-value pairs , wherein such second data set belongs to a second data group (repeating step) and the second key-value pairs have a second schema , partitioning the first data set into a plurality of data partitions and providing each data partition to a selected one of a plurality of mapping functions to form a second intermediate data set having a second set (second set) of resulting key-value pairs differing from the first set of resulting key-value pairs , wherein the first schema differs from the second schema ;

and reducing the first and the second intermediate data set together so as to form an output data set , wherein the reducing is accomplished by iterating on at least one key from the first and second intermediate data set , and the output data set has a different schema than the first and second schema , wherein the first and second schema and the first and second set of resulting key-value pairs have a key in common .
US20030227924A1
CLAIM 1
. A method of allocating demands of a plurality of connections transferring data through a network of nodes connected by links , the method comprising the steps of : (a) initializing a weight of each link based on a capacity of the link ;
(b) routing a demand for a current connection by the steps of : (b1) generating a minimum path weight for paths through nodes and links of the network for the current connection , wherein the minimum path weight is the least sum of link weights for links of each length-bounded path between a source node and a destination node of the current connection , (b2) determining an optimal length-bounded path through nodes and links of the network for the current connection based on the minimum path weight , (b3) routing a portion of a remainder of the demand as one or more flows over the length-bounded path , wherein the portion is based on a lesser value of the remainder of the demand and a minimum capacity of links in the length-bounded path , (b4) updating i) the link weight of each link based on the routed portion and a capacity of the link and ii) the remainder of the demand , and (b5) repeating step (first data, first data group, second data group) s (b1)-(b5) until the demand is routed ;
(d) repeating step (b) for each of the plurality of connections ;
(e) generating a scaling value based on a maximum ratio of a flow over a link and a capacity of the link .

US20030227924A1
CLAIM 9
. A method of allocating link capacity for a plurality of connections transferring data through a network , the method comprising the steps of : (a) generating a graph of the network , wherein the network includes a plurality of nodes interconnected by a plurality of links ;
(b) forming a linear programming problem based on the plurality of connections , wherein i) each connection defines a length-bounded path for a demand and ii) the linear programming problem tends to maximize a first objective function based on a first set (first set) of constraints ;
(c) forming a dual of the linear programming problem , wherein the dual tends to minimize a second objective function based on a second set (second set) of constraints ;
and (d) solving the dual to generate a scaling factor and routing of the length-bounded path for each of the plurality of connections .

US20030227924A1
CLAIM 12
. The invention of claim 9 , wherein , for step (b) , the linear programming sizing problem maximizes the scaling factor as the objective function ;
and wherein : the first set of constraints are A) a sum , of all flows on each link is less than the link' ;
s capacity , B) each demand as a function of the scaling factor is routed through the-network , and C) each flow over a link is non-negative ;
the dual minimizes a shortest length-bounded path weight for each of the plurality of connections ;
the second set of constraints are D) a sum of all link weights is less than the minimum shortest path (data groups) weight , E) each demand as a function of the minimum path weight is routed through the network , and F) each link weight is non-negative .




US8190610B2

Filed: 2006-10-05     Issued: 2012-05-29

MapReduce for distributed database processing

(Original Assignee) Yahoo Inc     (Current Assignee) R2 Solutions LLC ; Altaba Inc

Ali Dasdan, Hung-Chih Yang, Ruey-Lung Hsiao
US6990480B1

Filed: 2003-01-31     Issued: 2006-01-24

Information manager method and system

(Original Assignee) Trancept Ltd     (Current Assignee) Trancept Ltd

F. N. Burt
US8190610B2
CLAIM 1
. A method of processing data of a data set over a distributed system , wherein the data set comprises a plurality of data groups (respective value) , the method comprising : partitioning the data of each one of the data groups into a plurality of data partitions that each have a plurality of key-value pairs and providing each data partition to a selected one of a plurality of mapping functions that are each user-configurable to independently output a plurality of lists of values for each of a set of keys found in such map function' ;

s corresponding data partition to form corresponding intermediate data for that data group and identifiable to that data group , wherein the data of a first data group has a different schema than the data of a second data group and the data of the first data group is mapped differently than the data of the second data group so that different lists (given set) of values are output for the corresponding different intermediate data , wherein the different schema and corresponding different intermediate data have a key in common ;

and reducing the intermediate data for the data groups to at least one output data group , including processing the intermediate data for each data group in a manner that is defined to correspond to that data group , so as to result in a merging of the corresponding different intermediate data based on the key in common , wherein the mapping and reducing operations are performed by a distributed system .
US6990480B1
CLAIM 1
. An information management method comprising the following steps executed by a computer : receiving a user definition of a first user-interface page containing one or more first page fields including a plurality of first page reference fields , and for each first page reference field a respective agent set vector ;
receiving a user definition of multiple instances of the first user-interface page , wherein the user definition for each instance of the first user-interface page specifies a respective value (data groups) for each first page field including a respective set of first page reference field values , wherein the respective set of first page reference field values uniquely distinguishes each instance of the first user-interface page from each other instance of the first user-interface page ;
automatically storing each instance of the first user-interface page as a respective first page instance metaobject in a data storage medium without abstracting the instance of the first user-interface page into a database record , wherein each first page instance metaobject contains (A) first page attributes , including the agent set vector for each first page reference field , and (B) for each of the one or more first page fields , a respective first page field object that has (i) a value attribute that contains a field value , if any , that a user has entered into the first page field in the instance of the first page , (ii) a calculation attribute that contains a calculation if a calculation is assigned to the first page field , (iii) functional state attributes that cooperatively define a functional state of the first page field , and (iv) at least one appearance attribute that contains at least one appearance definition for the first page field ;
and thereafter (i) receiving from a user a given set (different lists) of first page reference field values for the first user-interface page and (ii) responsively retrieving from the data storage medium a given first page instance metaobject whose reference field values match the given set of first page reference field values , and displaying a corresponding instance of the first user-interface page , including all first page field values defined by the given first page instance metaobject , whereby the user can readily retrieve a previously entered instance of the first user-interface page by simply entering the reference field values of the previously entered instance .

US8190610B2
CLAIM 2
. The method of claim 1 , wherein : the at least one output data group is a plurality of output data groups (respective value) .
US6990480B1
CLAIM 1
. An information management method comprising the following steps executed by a computer : receiving a user definition of a first user-interface page containing one or more first page fields including a plurality of first page reference fields , and for each first page reference field a respective agent set vector ;
receiving a user definition of multiple instances of the first user-interface page , wherein the user definition for each instance of the first user-interface page specifies a respective value (data groups) for each first page field including a respective set of first page reference field values , wherein the respective set of first page reference field values uniquely distinguishes each instance of the first user-interface page from each other instance of the first user-interface page ;
automatically storing each instance of the first user-interface page as a respective first page instance metaobject in a data storage medium without abstracting the instance of the first user-interface page into a database record , wherein each first page instance metaobject contains (A) first page attributes , including the agent set vector for each first page reference field , and (B) for each of the one or more first page fields , a respective first page field object that has (i) a value attribute that contains a field value , if any , that a user has entered into the first page field in the instance of the first page , (ii) a calculation attribute that contains a calculation if a calculation is assigned to the first page field , (iii) functional state attributes that cooperatively define a functional state of the first page field , and (iv) at least one appearance attribute that contains at least one appearance definition for the first page field ;
and thereafter (i) receiving from a user a given set of first page reference field values for the first user-interface page and (ii) responsively retrieving from the data storage medium a given first page instance metaobject whose reference field values match the given set of first page reference field values , and displaying a corresponding instance of the first user-interface page , including all first page field values defined by the given first page instance metaobject , whereby the user can readily retrieve a previously entered instance of the first user-interface page by simply entering the reference field values of the previously entered instance .

US8190610B2
CLAIM 16
. The method of claim 1 wherein the reducing step includes relating the data among the plurality of data groups (respective value) .
US6990480B1
CLAIM 1
. An information management method comprising the following steps executed by a computer : receiving a user definition of a first user-interface page containing one or more first page fields including a plurality of first page reference fields , and for each first page reference field a respective agent set vector ;
receiving a user definition of multiple instances of the first user-interface page , wherein the user definition for each instance of the first user-interface page specifies a respective value (data groups) for each first page field including a respective set of first page reference field values , wherein the respective set of first page reference field values uniquely distinguishes each instance of the first user-interface page from each other instance of the first user-interface page ;
automatically storing each instance of the first user-interface page as a respective first page instance metaobject in a data storage medium without abstracting the instance of the first user-interface page into a database record , wherein each first page instance metaobject contains (A) first page attributes , including the agent set vector for each first page reference field , and (B) for each of the one or more first page fields , a respective first page field object that has (i) a value attribute that contains a field value , if any , that a user has entered into the first page field in the instance of the first page , (ii) a calculation attribute that contains a calculation if a calculation is assigned to the first page field , (iii) functional state attributes that cooperatively define a functional state of the first page field , and (iv) at least one appearance attribute that contains at least one appearance definition for the first page field ;
and thereafter (i) receiving from a user a given set of first page reference field values for the first user-interface page and (ii) responsively retrieving from the data storage medium a given first page instance metaobject whose reference field values match the given set of first page reference field values , and displaying a corresponding instance of the first user-interface page , including all first page field values defined by the given first page instance metaobject , whereby the user can readily retrieve a previously entered instance of the first user-interface page by simply entering the reference field values of the previously entered instance .

US8190610B2
CLAIM 17
. A computer system including a plurality of computing devices , the computer system configured to process data of a data set , wherein the data set comprises a plurality of data groups (respective value) , the computer system comprises at least one processor and memory that are operable to perform the following operations : partitioning the data of each one of the data groups into a plurality of data partitions that each have a plurality of key-value pairs and providing each data partition to a selected one of a plurality of mapping functions that are each user-configurable to independently output a plurality of lists of values for each of a set of keys found in such map function' ;

s corresponding data partition to form corresponding intermediate data for that data group and identifiable to that data group , wherein the data of a first data group has a different schema than the data of a second data group and the data of the first data group is mapped differently than the data of the second data group so that different lists (given set) of values are output for the corresponding different intermediate data , wherein the different schema and corresponding different intermediate data have a key in common ;

and reduce the intermediate data for the data groups to at least one output data group , including processing the intermediate data for each data group in a manner that is defined to correspond to that data group so as to result in a merging of the corresponding different intermediate data based on the key in common .
US6990480B1
CLAIM 1
. An information management method comprising the following steps executed by a computer : receiving a user definition of a first user-interface page containing one or more first page fields including a plurality of first page reference fields , and for each first page reference field a respective agent set vector ;
receiving a user definition of multiple instances of the first user-interface page , wherein the user definition for each instance of the first user-interface page specifies a respective value (data groups) for each first page field including a respective set of first page reference field values , wherein the respective set of first page reference field values uniquely distinguishes each instance of the first user-interface page from each other instance of the first user-interface page ;
automatically storing each instance of the first user-interface page as a respective first page instance metaobject in a data storage medium without abstracting the instance of the first user-interface page into a database record , wherein each first page instance metaobject contains (A) first page attributes , including the agent set vector for each first page reference field , and (B) for each of the one or more first page fields , a respective first page field object that has (i) a value attribute that contains a field value , if any , that a user has entered into the first page field in the instance of the first page , (ii) a calculation attribute that contains a calculation if a calculation is assigned to the first page field , (iii) functional state attributes that cooperatively define a functional state of the first page field , and (iv) at least one appearance attribute that contains at least one appearance definition for the first page field ;
and thereafter (i) receiving from a user a given set (different lists) of first page reference field values for the first user-interface page and (ii) responsively retrieving from the data storage medium a given first page instance metaobject whose reference field values match the given set of first page reference field values , and displaying a corresponding instance of the first user-interface page , including all first page field values defined by the given first page instance metaobject , whereby the user can readily retrieve a previously entered instance of the first user-interface page by simply entering the reference field values of the previously entered instance .

US8190610B2
CLAIM 18
. The computer system of claim 17 , wherein : the at least one output data group is a plurality of output data groups (respective value) .
US6990480B1
CLAIM 1
. An information management method comprising the following steps executed by a computer : receiving a user definition of a first user-interface page containing one or more first page fields including a plurality of first page reference fields , and for each first page reference field a respective agent set vector ;
receiving a user definition of multiple instances of the first user-interface page , wherein the user definition for each instance of the first user-interface page specifies a respective value (data groups) for each first page field including a respective set of first page reference field values , wherein the respective set of first page reference field values uniquely distinguishes each instance of the first user-interface page from each other instance of the first user-interface page ;
automatically storing each instance of the first user-interface page as a respective first page instance metaobject in a data storage medium without abstracting the instance of the first user-interface page into a database record , wherein each first page instance metaobject contains (A) first page attributes , including the agent set vector for each first page reference field , and (B) for each of the one or more first page fields , a respective first page field object that has (i) a value attribute that contains a field value , if any , that a user has entered into the first page field in the instance of the first page , (ii) a calculation attribute that contains a calculation if a calculation is assigned to the first page field , (iii) functional state attributes that cooperatively define a functional state of the first page field , and (iv) at least one appearance attribute that contains at least one appearance definition for the first page field ;
and thereafter (i) receiving from a user a given set of first page reference field values for the first user-interface page and (ii) responsively retrieving from the data storage medium a given first page instance metaobject whose reference field values match the given set of first page reference field values , and displaying a corresponding instance of the first user-interface page , including all first page field values defined by the given first page instance metaobject , whereby the user can readily retrieve a previously entered instance of the first user-interface page by simply entering the reference field values of the previously entered instance .

US8190610B2
CLAIM 32
. The computer system of claim 17 , wherein the at least one processor and memory are further operable to reduce by relating the data among the plurality of data groups (respective value) .
US6990480B1
CLAIM 1
. An information management method comprising the following steps executed by a computer : receiving a user definition of a first user-interface page containing one or more first page fields including a plurality of first page reference fields , and for each first page reference field a respective agent set vector ;
receiving a user definition of multiple instances of the first user-interface page , wherein the user definition for each instance of the first user-interface page specifies a respective value (data groups) for each first page field including a respective set of first page reference field values , wherein the respective set of first page reference field values uniquely distinguishes each instance of the first user-interface page from each other instance of the first user-interface page ;
automatically storing each instance of the first user-interface page as a respective first page instance metaobject in a data storage medium without abstracting the instance of the first user-interface page into a database record , wherein each first page instance metaobject contains (A) first page attributes , including the agent set vector for each first page reference field , and (B) for each of the one or more first page fields , a respective first page field object that has (i) a value attribute that contains a field value , if any , that a user has entered into the first page field in the instance of the first page , (ii) a calculation attribute that contains a calculation if a calculation is assigned to the first page field , (iii) functional state attributes that cooperatively define a functional state of the first page field , and (iv) at least one appearance attribute that contains at least one appearance definition for the first page field ;
and thereafter (i) receiving from a user a given set of first page reference field values for the first user-interface page and (ii) responsively retrieving from the data storage medium a given first page instance metaobject whose reference field values match the given set of first page reference field values , and displaying a corresponding instance of the first user-interface page , including all first page field values defined by the given first page instance metaobject , whereby the user can readily retrieve a previously entered instance of the first user-interface page by simply entering the reference field values of the previously entered instance .

US8190610B2
CLAIM 33
. A map-reduce method of processing data from a plurality of groups having different schema over a computer system , the method comprising : for a first data set having a plurality of first key-value pairs , wherein such first data set belongs to a first data group and the first key-value pairs have a first schema , partitioning the first data set into a plurality of data partitions and providing each data partition to a selected one of a plurality of mapping functions that are each user-configurable to independently output a plurality of lists of values for each of a set of keys found in such map function' ;

s corresponding data partition to form a first intermediate data set having a first set of resulting key-value pairs ;

for a second data set having a plurality of second key-value pairs , wherein such second data set belongs to a second data group and the second key-value pairs have a second schema , partitioning the first data set into a plurality of data partitions and providing each data partition to a selected one of a plurality of mapping functions to form a second intermediate data set having a second set (following steps) of resulting key-value pairs differing from the first set of resulting key-value pairs , wherein the first schema differs from the second schema , wherein the first and second schema and the first and second set of resulting key-value pairs have a key in common ;

and reducing the first and the second intermediate data set together so as to form an output data set (n value) , wherein the reducing is accomplished by iterating on at least one key from the first and second intermediate data set , and the output data set has a different schema than the first and second schema , wherein the mapping and reducing operations are performed by a distributed system .
US6990480B1
CLAIM 1
. An information management method comprising the following steps (second set, reduce method) executed by a computer : receiving a user definition of a first user-interface page containing one or more first page fields including a plurality of first page reference fields , and for each first page reference field a respective agent set vector ;
receiving a user definition of multiple instances of the first user-interface page , wherein the user definition for each instance of the first user-interface page specifies a respective value for each first page field including a respective set of first page reference field values , wherein the respective set of first page reference field values uniquely distinguishes each instance of the first user-interface page from each other instance of the first user-interface page ;
automatically storing each instance of the first user-interface page as a respective first page instance metaobject in a data storage medium without abstracting the instance of the first user-interface page into a database record , wherein each first page instance metaobject contains (A) first page attributes , including the agent set vector for each first page reference field , and (B) for each of the one or more first page fields , a respective first page field object that has (i) a value attribute that contains a field value , if any , that a user has entered into the first page field in the instance of the first page , (ii) a calculation attribute that contains a calculation if a calculation is assigned to the first page field , (iii) functional state attributes that cooperatively define a functional state of the first page field , and (iv) at least one appearance attribute that contains at least one appearance definition for the first page field ;
and thereafter (i) receiving from a user a given set of first page reference field values for the first user-interface page and (ii) responsively retrieving from the data storage medium a given first page instance metaobject whose reference field values match the given set of first page reference field values , and displaying a corresponding instance of the first user-interface page , including all first page field values defined by the given first page instance metaobject , whereby the user can readily retrieve a previously entered instance of the first user-interface page by simply entering the reference field values of the previously entered instance .

US6990480B1
CLAIM 10
. The method of claim 1 , further comprising : receiving a user definition of a second user-interface page containing one or more second page fields , wherein at least one of the second page fields contains a calculation that references at least one first page field ;
performing the calculation in an instance of the second user-interface page ;
and upon change in value (output data set) of the first page field in an instance of the first user-interface page , sending the changed value to the instance of the second user-interface page to facilitate re-performance of the calculation .

US8190610B2
CLAIM 34
. The map-reduce method of claim 33 , wherein the reducing is accomplished by iterating on at least one key from each of the first and second intermediate data set so that the output data set (n value) is a merging of a portion of the first and second intermediate data set .
US6990480B1
CLAIM 10
. The method of claim 1 , further comprising : receiving a user definition of a second user-interface page containing one or more second page fields , wherein at least one of the second page fields contains a calculation that references at least one first page field ;
performing the calculation in an instance of the second user-interface page ;
and upon change in value (output data set) of the first page field in an instance of the first user-interface page , sending the changed value to the instance of the second user-interface page to facilitate re-performance of the calculation .

US8190610B2
CLAIM 40
. A computer system including a plurality of computing devices , the computer system configured to process data of a data set , wherein the data set comprises a plurality of data groups (respective value) , the computer system comprises at least one processor and memory that are operable to perform the following operations : for a first data set having a plurality of first key-value pairs , wherein such first data set belongs to a first data group and the first key-value pairs have a first schema , partitioning the first data set into a plurality of data partitions and providing each data partition to a selected one of a plurality of mapping functions that are each user-configurable to independently output a plurality of lists of values for each of a set of keys found in such map function' ;

s corresponding data partition to form a first intermediate data set having a first set of resulting key-value pairs ;

for a second data set having a plurality of second key-value pairs , wherein such second data set belongs to a second data group and the second key-value pairs have a second schema , partitioning the first data set into a plurality of data partitions and providing each data partition to a selected one of a plurality of mapping functions to form a second intermediate data set having a second set (following steps) of resulting key-value pairs differing from the first set of resulting key-value pairs , wherein the first schema differs from the second schema ;

and reducing the first and the second intermediate data set together so as to form an output data set (n value) , wherein the reducing is accomplished by iterating on at least one key from the first and second intermediate data set , and the output data set has a different schema than the first and second schema , wherein the first and second schema and the first and second set of resulting key-value pairs have a key in common .
US6990480B1
CLAIM 1
. An information management method comprising the following steps (second set, reduce method) executed by a computer : receiving a user definition of a first user-interface page containing one or more first page fields including a plurality of first page reference fields , and for each first page reference field a respective agent set vector ;
receiving a user definition of multiple instances of the first user-interface page , wherein the user definition for each instance of the first user-interface page specifies a respective value (data groups) for each first page field including a respective set of first page reference field values , wherein the respective set of first page reference field values uniquely distinguishes each instance of the first user-interface page from each other instance of the first user-interface page ;
automatically storing each instance of the first user-interface page as a respective first page instance metaobject in a data storage medium without abstracting the instance of the first user-interface page into a database record , wherein each first page instance metaobject contains (A) first page attributes , including the agent set vector for each first page reference field , and (B) for each of the one or more first page fields , a respective first page field object that has (i) a value attribute that contains a field value , if any , that a user has entered into the first page field in the instance of the first page , (ii) a calculation attribute that contains a calculation if a calculation is assigned to the first page field , (iii) functional state attributes that cooperatively define a functional state of the first page field , and (iv) at least one appearance attribute that contains at least one appearance definition for the first page field ;
and thereafter (i) receiving from a user a given set of first page reference field values for the first user-interface page and (ii) responsively retrieving from the data storage medium a given first page instance metaobject whose reference field values match the given set of first page reference field values , and displaying a corresponding instance of the first user-interface page , including all first page field values defined by the given first page instance metaobject , whereby the user can readily retrieve a previously entered instance of the first user-interface page by simply entering the reference field values of the previously entered instance .

US6990480B1
CLAIM 10
. The method of claim 1 , further comprising : receiving a user definition of a second user-interface page containing one or more second page fields , wherein at least one of the second page fields contains a calculation that references at least one first page field ;
performing the calculation in an instance of the second user-interface page ;
and upon change in value (output data set) of the first page field in an instance of the first user-interface page , sending the changed value to the instance of the second user-interface page to facilitate re-performance of the calculation .

US8190610B2
CLAIM 41
. The computer system of claim 40 , wherein the reducing is accomplished by iterating on at least one key from each of the first and second intermediate data set so that the output data set (n value) is a merging of a portion of the first and second intermediate data set .
US6990480B1
CLAIM 10
. The method of claim 1 , further comprising : receiving a user definition of a second user-interface page containing one or more second page fields , wherein at least one of the second page fields contains a calculation that references at least one first page field ;
performing the calculation in an instance of the second user-interface page ;
and upon change in value (output data set) of the first page field in an instance of the first user-interface page , sending the changed value to the instance of the second user-interface page to facilitate re-performance of the calculation .




US8190610B2

Filed: 2006-10-05     Issued: 2012-05-29

MapReduce for distributed database processing

(Original Assignee) Yahoo Inc     (Current Assignee) R2 Solutions LLC ; Altaba Inc

Ali Dasdan, Hung-Chih Yang, Ruey-Lung Hsiao
JP2004226214A

Filed: 2003-01-22     Issued: 2004-08-12

地図情報処理装置、そのシステム、その方法、そのプログラム、および、そのプログラムを記録した記録媒体

(Original Assignee) Inkurimento P Kk; インクリメント・ピー株式会社     

Hidenori Maeda, 英範 前田
US8190610B2
CLAIM 1
. A method of processing data of a data set over a distributed system , wherein the data set comprises a plurality of data groups , the method comprising : partitioning the data of each one of the data groups into a plurality of data partitions that each have a plurality of key-value pairs and providing each data partition to a selected one of a plurality of mapping functions that are each user-configurable to independently output a plurality of lists of values for each of a set of keys found in such map function' ;

s corresponding data partition to form corresponding intermediate data (有し一対) (有し一対) for that data group and identifiable to that data group , wherein the data of a first data group has a different schema than the data of a second data group and the data of the first data group is mapped differently than the data of the second data group so that different lists of values are output for the corresponding different intermediate data , wherein the different schema and corresponding different intermediate data have a key in common ;

and reducing the intermediate data for the data groups to at least one output data group , including processing the intermediate data for each data group in a manner that is defined to correspond to that data group , so as to result in a merging of the corresponding different intermediate data based on the key in common , wherein the mapping and reducing operations are performed by a distributed system .
JP2004226214A
CLAIM 1
地図情報を用いて移動体の移動状況を報知するためにネットワークを介して前記地図情報を配信する地図情報処理装置であって、 前記地図情報は、座標情報および固有の地点固有情報を有し所定の地点を表す複数の地点情報および固有の線分固有情報を有し一対 (s corresponding data partition to form corresponding intermediate data, corresponding intermediate data) の前記地点情報を結ぶ線分情報を有し、前記地点情報および前記線分情報により道路が表され、所定の領域に分割された複数のマッチングメッシュ情報を有したマッチングデータと、このマッチングデータに対応し所定の領域の地図を構成する要素に関する要素データを有し、所定の領域に分割された複数の表示用メッシュ情報を有した表示用データと、を有し、 前記地図情報を記憶する記憶手段と、 前記移動体の現在位置に関する現在位置情報および前記移動体の移動する目的地に関する目的地情報を取得する情報取得手段と、 前記現在位置情報および前記目的地情報に基づいて前記マッチングデータを用いて前記移動体が移動する移動経路を探索する探索手段と、 前記探索された移動経路に対応する道路を表す前記地点情報および前記線分情報を含むマッチングメッシュ情報、および、このマッチングメッシュ情報の領域以外の領域に対応する表示用メッシュ情報を、前記移動経路に関する情報とともに前記ネットワークを介して配信させる配信制御手段と、 を具備したことを特徴とした地図情報処理装置。

US8190610B2
CLAIM 3
. The method of claim 1 , wherein : corresponding intermediate data (有し一対) for a data group being identifiable to the data group includes the corresponding intermediate data being stored such that metadata associated with a manner of storage is identifiable to the data group .
JP2004226214A
CLAIM 1
地図情報を用いて移動体の移動状況を報知するためにネットワークを介して前記地図情報を配信する地図情報処理装置であって、 前記地図情報は、座標情報および固有の地点固有情報を有し所定の地点を表す複数の地点情報および固有の線分固有情報を有し一対 (s corresponding data partition to form corresponding intermediate data, corresponding intermediate data) の前記地点情報を結ぶ線分情報を有し、前記地点情報および前記線分情報により道路が表され、所定の領域に分割された複数のマッチングメッシュ情報を有したマッチングデータと、このマッチングデータに対応し所定の領域の地図を構成する要素に関する要素データを有し、所定の領域に分割された複数の表示用メッシュ情報を有した表示用データと、を有し、 前記地図情報を記憶する記憶手段と、 前記移動体の現在位置に関する現在位置情報および前記移動体の移動する目的地に関する目的地情報を取得する情報取得手段と、 前記現在位置情報および前記目的地情報に基づいて前記マッチングデータを用いて前記移動体が移動する移動経路を探索する探索手段と、 前記探索された移動経路に対応する道路を表す前記地点情報および前記線分情報を含むマッチングメッシュ情報、および、このマッチングメッシュ情報の領域以外の領域に対応する表示用メッシュ情報を、前記移動経路に関する情報とともに前記ネットワークを介して配信させる配信制御手段と、 を具備したことを特徴とした地図情報処理装置。

US8190610B2
CLAIM 4
. The method of claim 1 , wherein : corresponding intermediate data (有し一対) for a data group being identifiable to the data group includes the corresponding intermediate data being stored in association with data that is identifiable to the data group .
JP2004226214A
CLAIM 1
地図情報を用いて移動体の移動状況を報知するためにネットワークを介して前記地図情報を配信する地図情報処理装置であって、 前記地図情報は、座標情報および固有の地点固有情報を有し所定の地点を表す複数の地点情報および固有の線分固有情報を有し一対 (s corresponding data partition to form corresponding intermediate data, corresponding intermediate data) の前記地点情報を結ぶ線分情報を有し、前記地点情報および前記線分情報により道路が表され、所定の領域に分割された複数のマッチングメッシュ情報を有したマッチングデータと、このマッチングデータに対応し所定の領域の地図を構成する要素に関する要素データを有し、所定の領域に分割された複数の表示用メッシュ情報を有した表示用データと、を有し、 前記地図情報を記憶する記憶手段と、 前記移動体の現在位置に関する現在位置情報および前記移動体の移動する目的地に関する目的地情報を取得する情報取得手段と、 前記現在位置情報および前記目的地情報に基づいて前記マッチングデータを用いて前記移動体が移動する移動経路を探索する探索手段と、 前記探索された移動経路に対応する道路を表す前記地点情報および前記線分情報を含むマッチングメッシュ情報、および、このマッチングメッシュ情報の領域以外の領域に対応する表示用メッシュ情報を、前記移動経路に関する情報とともに前記ネットワークを介して配信させる配信制御手段と、 を具備したことを特徴とした地図情報処理装置。

US8190610B2
CLAIM 17
. A computer system including a plurality of computing devices , the computer system configured to process data of a data set , wherein the data set comprises a plurality of data groups , the computer system comprises at least one processor and memory that are operable to perform the following operations : partitioning the data of each one of the data groups into a plurality of data partitions that each have a plurality of key-value pairs and providing each data partition to a selected one of a plurality of mapping functions that are each user-configurable to independently output a plurality of lists of values for each of a set of keys found in such map function' ;

s corresponding data partition to form corresponding intermediate data (有し一対) (有し一対) for that data group and identifiable to that data group , wherein the data of a first data group has a different schema than the data of a second data group and the data of the first data group is mapped differently than the data of the second data group so that different lists of values are output for the corresponding different intermediate data , wherein the different schema and corresponding different intermediate data have a key in common ;

and reduce the intermediate data for the data groups to at least one output data group , including processing the intermediate data for each data group in a manner that is defined to correspond to that data group so as to result in a merging of the corresponding different intermediate data based on the key in common .
JP2004226214A
CLAIM 1
地図情報を用いて移動体の移動状況を報知するためにネットワークを介して前記地図情報を配信する地図情報処理装置であって、 前記地図情報は、座標情報および固有の地点固有情報を有し所定の地点を表す複数の地点情報および固有の線分固有情報を有し一対 (s corresponding data partition to form corresponding intermediate data, corresponding intermediate data) の前記地点情報を結ぶ線分情報を有し、前記地点情報および前記線分情報により道路が表され、所定の領域に分割された複数のマッチングメッシュ情報を有したマッチングデータと、このマッチングデータに対応し所定の領域の地図を構成する要素に関する要素データを有し、所定の領域に分割された複数の表示用メッシュ情報を有した表示用データと、を有し、 前記地図情報を記憶する記憶手段と、 前記移動体の現在位置に関する現在位置情報および前記移動体の移動する目的地に関する目的地情報を取得する情報取得手段と、 前記現在位置情報および前記目的地情報に基づいて前記マッチングデータを用いて前記移動体が移動する移動経路を探索する探索手段と、 前記探索された移動経路に対応する道路を表す前記地点情報および前記線分情報を含むマッチングメッシュ情報、および、このマッチングメッシュ情報の領域以外の領域に対応する表示用メッシュ情報を、前記移動経路に関する情報とともに前記ネットワークを介して配信させる配信制御手段と、 を具備したことを特徴とした地図情報処理装置。

US8190610B2
CLAIM 19
. The computer system of claim 17 , wherein : corresponding intermediate data (有し一対) for a data group being identifiable to the data group includes the corresponding intermediate data being stored such that metadata associated with the manner of storage is identifiable to the data group .
JP2004226214A
CLAIM 1
地図情報を用いて移動体の移動状況を報知するためにネットワークを介して前記地図情報を配信する地図情報処理装置であって、 前記地図情報は、座標情報および固有の地点固有情報を有し所定の地点を表す複数の地点情報および固有の線分固有情報を有し一対 (s corresponding data partition to form corresponding intermediate data, corresponding intermediate data) の前記地点情報を結ぶ線分情報を有し、前記地点情報および前記線分情報により道路が表され、所定の領域に分割された複数のマッチングメッシュ情報を有したマッチングデータと、このマッチングデータに対応し所定の領域の地図を構成する要素に関する要素データを有し、所定の領域に分割された複数の表示用メッシュ情報を有した表示用データと、を有し、 前記地図情報を記憶する記憶手段と、 前記移動体の現在位置に関する現在位置情報および前記移動体の移動する目的地に関する目的地情報を取得する情報取得手段と、 前記現在位置情報および前記目的地情報に基づいて前記マッチングデータを用いて前記移動体が移動する移動経路を探索する探索手段と、 前記探索された移動経路に対応する道路を表す前記地点情報および前記線分情報を含むマッチングメッシュ情報、および、このマッチングメッシュ情報の領域以外の領域に対応する表示用メッシュ情報を、前記移動経路に関する情報とともに前記ネットワークを介して配信させる配信制御手段と、 を具備したことを特徴とした地図情報処理装置。

US8190610B2
CLAIM 20
. The computer system of claim 19 , wherein : corresponding intermediate data (有し一対) for a data group being identifiable to the data group includes the computer system being configured to store the corresponding intermediate data in association with data that is identifiable to the data group .
JP2004226214A
CLAIM 1
地図情報を用いて移動体の移動状況を報知するためにネットワークを介して前記地図情報を配信する地図情報処理装置であって、 前記地図情報は、座標情報および固有の地点固有情報を有し所定の地点を表す複数の地点情報および固有の線分固有情報を有し一対 (s corresponding data partition to form corresponding intermediate data, corresponding intermediate data) の前記地点情報を結ぶ線分情報を有し、前記地点情報および前記線分情報により道路が表され、所定の領域に分割された複数のマッチングメッシュ情報を有したマッチングデータと、このマッチングデータに対応し所定の領域の地図を構成する要素に関する要素データを有し、所定の領域に分割された複数の表示用メッシュ情報を有した表示用データと、を有し、 前記地図情報を記憶する記憶手段と、 前記移動体の現在位置に関する現在位置情報および前記移動体の移動する目的地に関する目的地情報を取得する情報取得手段と、 前記現在位置情報および前記目的地情報に基づいて前記マッチングデータを用いて前記移動体が移動する移動経路を探索する探索手段と、 前記探索された移動経路に対応する道路を表す前記地点情報および前記線分情報を含むマッチングメッシュ情報、および、このマッチングメッシュ情報の領域以外の領域に対応する表示用メッシュ情報を、前記移動経路に関する情報とともに前記ネットワークを介して配信させる配信制御手段と、 を具備したことを特徴とした地図情報処理装置。




US8190610B2

Filed: 2006-10-05     Issued: 2012-05-29

MapReduce for distributed database processing

(Original Assignee) Yahoo Inc     (Current Assignee) R2 Solutions LLC ; Altaba Inc

Ali Dasdan, Hung-Chih Yang, Ruey-Lung Hsiao
CN1517906A

Filed: 2003-01-14     Issued: 2004-08-04

文件系统及文件管理方法

(Original Assignee) Lenovo Beijing Ltd     (Current Assignee) Lenovo Beijing Ltd

鹏 张, 张鹏
US8190610B2
CLAIM 1
. A method of processing data of a data set over a distributed system , wherein the data set comprises a plurality of data groups , the method comprising : partitioning the data of each one of the data groups into a plurality of data partitions that each have a plurality of key-value pairs and providing each data partition to a selected one of a plurality of mapping functions that are each user-configurable to independently output a plurality of lists of values for each of a set of keys found in such map function' ;

s corresponding data partition (第二个) to form corresponding intermediate data for that data group and identifiable to that data group , wherein the data of a first data (第二个) group has a different schema (数据文件) than the data of a second data group and the data of the first data group is mapped differently than the data of the second data group so that different lists of values are output for the corresponding different intermediate data , wherein the different schema and corresponding different intermediate data have a key in common ;

and reducing the intermediate data for the data groups to at least one output data group , including processing the intermediate data for each data group in a manner that is defined to correspond to that data group , so as to result in a merging of the corresponding different intermediate data based on the key in common , wherein the mapping and reducing operations are performed by a distributed system .
CN1517906A
CLAIM 5
. 如权利要求4所述的文件系统,其特征在于:每个单向链表由一个或多个索引块依序组成,索引地址为代表实际文件在数据区的第一个物理存储位置的第一个索引块的地址;第一个索引块保存代表实际文件在数据区的第二个 (first data, first data group, first data set, s corresponding data partition) 物理存储位置的第二个索引块的地址;第二个索引块保存代表实际文件在数据区的第三个物理存储位置的第三个索引块的地址;以此类推成为单向链表。

CN1517906A
CLAIM 23
. 如权利要求10所述的文件管理方法,其特征在于:数据库文件包括索引文件及数据文件 (different schema) 存储空间占有比例小于或等于1∶3。

US8190610B2
CLAIM 8
. The method of claim 7 , wherein : in the partitioning step (进一步) , all of the key/value pairs of the intermediate data are provided to all of the partitions .
CN1517906A
CLAIM 12
. 如权利要求11所述的文件管理方法,其特征在于:步骤1)进一步 (partitioning step) 包括如下步骤a)读取文件系统分区信息记录区的数据,得到文件系统基本信息;b)读取文件系统文件记录表记录区的数据,得到基本文件信息及单向链表的索引地址,并组织成相应数据库文件并保存;c)读取文件系统未分配簇记录区的数据,得到数据区未分配簇所代表的物理存储位置相关信息,并组织成相应数据库文件并保存。

US8190610B2
CLAIM 12
. The method of claim 5 , wherein : the intermediate data includes a plurality of grouped sets of key/value pairs ;

the reducing step is carried out by a plurality of reducers ;

for at least one of the reducers , the iterator corresponding to a particular data group , for that reducer , operates according to a different key of a different schema (数据文件) than the iterator corresponding to another particular data group , for that reducer .
CN1517906A
CLAIM 23
. 如权利要求10所述的文件管理方法,其特征在于:数据库文件包括索引文件及数据文件 (different schema) 存储空间占有比例小于或等于1∶3。

US8190610B2
CLAIM 17
. A computer system including a plurality of computing devices , the computer system configured to process data of a data set , wherein the data set comprises a plurality of data groups , the computer system comprises at least one processor and memory that are operable to perform the following operations : partitioning the data of each one of the data groups into a plurality of data partitions that each have a plurality of key-value pairs and providing each data partition to a selected one of a plurality of mapping functions that are each user-configurable to independently output a plurality of lists of values for each of a set of keys found in such map function' ;

s corresponding data partition (第二个) to form corresponding intermediate data for that data group and identifiable to that data group , wherein the data of a first data (第二个) group has a different schema (数据文件) than the data of a second data group and the data of the first data group is mapped differently than the data of the second data group so that different lists of values are output for the corresponding different intermediate data , wherein the different schema and corresponding different intermediate data have a key in common ;

and reduce the intermediate data for the data groups to at least one output data group , including processing the intermediate data for each data group in a manner that is defined to correspond to that data group so as to result in a merging of the corresponding different intermediate data based on the key in common .
CN1517906A
CLAIM 5
. 如权利要求4所述的文件系统,其特征在于:每个单向链表由一个或多个索引块依序组成,索引地址为代表实际文件在数据区的第一个物理存储位置的第一个索引块的地址;第一个索引块保存代表实际文件在数据区的第二个 (first data, first data group, first data set, s corresponding data partition) 物理存储位置的第二个索引块的地址;第二个索引块保存代表实际文件在数据区的第三个物理存储位置的第三个索引块的地址;以此类推成为单向链表。

CN1517906A
CLAIM 23
. 如权利要求10所述的文件管理方法,其特征在于:数据库文件包括索引文件及数据文件 (different schema) 存储空间占有比例小于或等于1∶3。

US8190610B2
CLAIM 22
. The computer system of claim 21 , wherein : the intermediate data includes a plurality of grouped sets of key/value pairs ;

the at least one processor and memory are further operable to reduce the intermediate data via a plurality of reducers ;

for at least one of the reducers , the iterator corresponding to a particular data group , for that reducer , is configured to operate according to a different key of a different schema (数据文件) than the iterator corresponding to another particular data group , for that reducer .
CN1517906A
CLAIM 23
. 如权利要求10所述的文件管理方法,其特征在于:数据库文件包括索引文件及数据文件 (different schema) 存储空间占有比例小于或等于1∶3。

US8190610B2
CLAIM 33
. A map-reduce method of processing data from a plurality of groups having different schema (数据文件) over a computer system , the method comprising : for a first data (第二个) set having a plurality of first key-value pairs , wherein such first data set belongs to a first data group and the first key-value pairs have a first schema (逻辑组合) , partitioning the first data set into a plurality of data partitions and providing each data partition to a selected one of a plurality of mapping functions that are each user-configurable to independently output a plurality of lists of values for each of a set of keys found in such map function' ;

s corresponding data partition (第二个) to form a first intermediate data set having a first set of resulting key-value pairs ;

for a second data set having a plurality of second key-value pairs , wherein such second data set belongs to a second data group and the second key-value pairs have a second schema , partitioning the first data set into a plurality of data partitions and providing each data partition to a selected one of a plurality of mapping functions to form a second intermediate data set having a second set of resulting key-value pairs differing from the first set of resulting key-value pairs , wherein the first schema differs from the second schema , wherein the first and second schema and the first and second set of resulting key-value pairs have a key in common ;

and reducing the first and the second intermediate data set together so as to form an output data set , wherein the reducing is accomplished by iterating on at least one key from the first and second intermediate data set , and the output data set has a different schema than the first and second schema , wherein the mapping and reducing operations are performed by a distributed system .
CN1517906A
CLAIM 1
. 一种文件系统,构建在存储介质上,其结构包括数据区,实际文件数据的保存区;元数据区,文件系统建立从物理数据到逻辑数据的映射关系而使用的标志数据的保存区,其特征在于:元数据区保存有每个实际文件在数据区中的数据存储物理位置的单向链表,而使每个实际文件的数据具有唯一的逻辑组合 (first schema) 并且通过数据库保存每个单向链表的物理保存区的索引地址而对实际文件进行管理。

CN1517906A
CLAIM 5
. 如权利要求4所述的文件系统,其特征在于:每个单向链表由一个或多个索引块依序组成,索引地址为代表实际文件在数据区的第一个物理存储位置的第一个索引块的地址;第一个索引块保存代表实际文件在数据区的第二个 (first data, first data group, first data set, s corresponding data partition) 物理存储位置的第二个索引块的地址;第二个索引块保存代表实际文件在数据区的第三个物理存储位置的第三个索引块的地址;以此类推成为单向链表。

CN1517906A
CLAIM 23
. 如权利要求10所述的文件管理方法,其特征在于:数据库文件包括索引文件及数据文件 (different schema) 存储空间占有比例小于或等于1∶3。

US8190610B2
CLAIM 40
. A computer system including a plurality of computing devices , the computer system configured to process data of a data set , wherein the data set comprises a plurality of data groups , the computer system comprises at least one processor and memory that are operable to perform the following operations : for a first data (第二个) set having a plurality of first key-value pairs , wherein such first data set belongs to a first data group and the first key-value pairs have a first schema (逻辑组合) , partitioning the first data set into a plurality of data partitions and providing each data partition to a selected one of a plurality of mapping functions that are each user-configurable to independently output a plurality of lists of values for each of a set of keys found in such map function' ;

s corresponding data partition (第二个) to form a first intermediate data set having a first set of resulting key-value pairs ;

for a second data set having a plurality of second key-value pairs , wherein such second data set belongs to a second data group and the second key-value pairs have a second schema , partitioning the first data set into a plurality of data partitions and providing each data partition to a selected one of a plurality of mapping functions to form a second intermediate data set having a second set of resulting key-value pairs differing from the first set of resulting key-value pairs , wherein the first schema differs from the second schema ;

and reducing the first and the second intermediate data set together so as to form an output data set , wherein the reducing is accomplished by iterating on at least one key from the first and second intermediate data set , and the output data set has a different schema (数据文件) than the first and second schema , wherein the first and second schema and the first and second set of resulting key-value pairs have a key in common .
CN1517906A
CLAIM 1
. 一种文件系统,构建在存储介质上,其结构包括数据区,实际文件数据的保存区;元数据区,文件系统建立从物理数据到逻辑数据的映射关系而使用的标志数据的保存区,其特征在于:元数据区保存有每个实际文件在数据区中的数据存储物理位置的单向链表,而使每个实际文件的数据具有唯一的逻辑组合 (first schema) 并且通过数据库保存每个单向链表的物理保存区的索引地址而对实际文件进行管理。

CN1517906A
CLAIM 5
. 如权利要求4所述的文件系统,其特征在于:每个单向链表由一个或多个索引块依序组成,索引地址为代表实际文件在数据区的第一个物理存储位置的第一个索引块的地址;第一个索引块保存代表实际文件在数据区的第二个 (first data, first data group, first data set, s corresponding data partition) 物理存储位置的第二个索引块的地址;第二个索引块保存代表实际文件在数据区的第三个物理存储位置的第三个索引块的地址;以此类推成为单向链表。

CN1517906A
CLAIM 23
. 如权利要求10所述的文件管理方法,其特征在于:数据库文件包括索引文件及数据文件 (different schema) 存储空间占有比例小于或等于1∶3。




US8190610B2

Filed: 2006-10-05     Issued: 2012-05-29

MapReduce for distributed database processing

(Original Assignee) Yahoo Inc     (Current Assignee) R2 Solutions LLC ; Altaba Inc

Ali Dasdan, Hung-Chih Yang, Ruey-Lung Hsiao
US20040122803A1

Filed: 2002-12-19     Issued: 2004-06-24

Detect and qualify relationships between people and find the best path through the resulting social network

(Original Assignee) International Business Machines Corp     (Current Assignee) International Business Machines Corp

Byron Dom, Joann Ruvolo, Geetika Tewari
US8190610B2
CLAIM 1
. A method of processing data of a data set over a distributed system , wherein the data set comprises a plurality of data groups , the method comprising : partitioning the data of each one of the data groups into a plurality of data partitions that each have a plurality of key-value pairs and providing each data partition to a selected one of a plurality of mapping functions that are each user-configurable to independently output a plurality of lists of values for each of a set of keys found in such map function' ;

s corresponding data partition to form corresponding intermediate data for that data group (last access) and identifiable to that data group , wherein the data of a first data group has a different schema than the data of a second data group and the data of the first data group is mapped differently than the data of the second data group so that different lists (list information) of values are output for the corresponding different intermediate data , wherein the different schema and corresponding different intermediate data have a key in common ;

and reducing the intermediate data for the data groups to at least one output data group , including processing the intermediate data for each data group in a manner that is defined to correspond to that data group , so as to result in a merging of the corresponding different intermediate data based on the key in common , wherein the mapping and reducing operations are performed by a distributed system .
US20040122803A1
CLAIM 1
. A method of identifying relationships between users of a computerized network , said method comprising : extracting relationship information from databases in said network , said information comprising at least one of address book information , calendar information , event information , to-do list information (different lists) , journal information , and e-mail information ;
and evaluating said relationship information to produce relationship ratings of said users of said network .

US20040122803A1
CLAIM 10
. The method in claim 7 , wherein said evaluating comprises determining a time of a last access (data group) to establish how current relations are between said different users .

US8190610B2
CLAIM 2
. The method of claim 1 , wherein : the at least one output data group (last access) is a plurality of output data groups .
US20040122803A1
CLAIM 10
. The method in claim 7 , wherein said evaluating comprises determining a time of a last access (data group) to establish how current relations are between said different users .

US8190610B2
CLAIM 3
. The method of claim 1 , wherein : corresponding intermediate data for a data group (last access) being identifiable to the data group includes the corresponding intermediate data being stored such that metadata associated with a manner of storage is identifiable to the data group .
US20040122803A1
CLAIM 10
. The method in claim 7 , wherein said evaluating comprises determining a time of a last access (data group) to establish how current relations are between said different users .

US8190610B2
CLAIM 4
. The method of claim 1 , wherein : corresponding intermediate data for a data group (last access) being identifiable to the data group includes the corresponding intermediate data being stored in association with data that is identifiable to the data group .
US20040122803A1
CLAIM 10
. The method in claim 7 , wherein said evaluating comprises determining a time of a last access (data group) to establish how current relations are between said different users .

US8190610B2
CLAIM 5
. The method of claim 1 , wherein : processing the intermediate data for each data group (last access) in a manner that is defined to correspond to that data group includes , for each data group , employing an iterator that corresponds to that data group .
US20040122803A1
CLAIM 10
. The method in claim 7 , wherein said evaluating comprises determining a time of a last access (data group) to establish how current relations are between said different users .

US8190610B2
CLAIM 11
. The method of claim 10 , wherein : processing the intermediate data for each data group (last access) in a manner that is defined to correspond to that data group includes , for each data group , employing an iterator that corresponds to that data group , wherein the iterator includes providing the associated metadata to the processing of the reducing step .
US20040122803A1
CLAIM 10
. The method in claim 7 , wherein said evaluating comprises determining a time of a last access (data group) to establish how current relations are between said different users .

US8190610B2
CLAIM 12
. The method of claim 5 , wherein : the intermediate data includes a plurality of grouped sets of key/value pairs ;

the reducing step is carried out by a plurality of reducers ;

for at least one of the reducers , the iterator corresponding to a particular data group (last access) , for that reducer , operates according to a different key of a different schema than the iterator corresponding to another particular data group , for that reducer .
US20040122803A1
CLAIM 10
. The method in claim 7 , wherein said evaluating comprises determining a time of a last access (data group) to establish how current relations are between said different users .

US8190610B2
CLAIM 17
. A computer system including a plurality of computing devices , the computer system configured to process data of a data set , wherein the data set comprises a plurality of data groups , the computer system comprises at least one processor and memory that are operable to perform the following operations : partitioning the data of each one of the data groups into a plurality of data partitions that each have a plurality of key-value pairs and providing each data partition to a selected one of a plurality of mapping functions that are each user-configurable to independently output a plurality of lists of values for each of a set of keys found in such map function' ;

s corresponding data partition to form corresponding intermediate data for that data group (last access) and identifiable to that data group , wherein the data of a first data group has a different schema than the data of a second data group and the data of the first data group is mapped differently than the data of the second data group so that different lists (list information) of values are output for the corresponding different intermediate data , wherein the different schema and corresponding different intermediate data have a key in common ;

and reduce the intermediate data for the data groups to at least one output data group , including processing the intermediate data for each data group in a manner that is defined to correspond to that data group so as to result in a merging of the corresponding different intermediate data based on the key in common .
US20040122803A1
CLAIM 1
. A method of identifying relationships between users of a computerized network , said method comprising : extracting relationship information from databases in said network , said information comprising at least one of address book information , calendar information , event information , to-do list information (different lists) , journal information , and e-mail information ;
and evaluating said relationship information to produce relationship ratings of said users of said network .

US20040122803A1
CLAIM 10
. The method in claim 7 , wherein said evaluating comprises determining a time of a last access (data group) to establish how current relations are between said different users .

US8190610B2
CLAIM 18
. The computer system of claim 17 , wherein : the at least one output data group (last access) is a plurality of output data groups .
US20040122803A1
CLAIM 10
. The method in claim 7 , wherein said evaluating comprises determining a time of a last access (data group) to establish how current relations are between said different users .

US8190610B2
CLAIM 19
. The computer system of claim 17 , wherein : corresponding intermediate data for a data group (last access) being identifiable to the data group includes the corresponding intermediate data being stored such that metadata associated with the manner of storage is identifiable to the data group .
US20040122803A1
CLAIM 10
. The method in claim 7 , wherein said evaluating comprises determining a time of a last access (data group) to establish how current relations are between said different users .

US8190610B2
CLAIM 20
. The computer system of claim 19 , wherein : corresponding intermediate data for a data group (last access) being identifiable to the data group includes the computer system being configured to store the corresponding intermediate data in association with data that is identifiable to the data group .
US20040122803A1
CLAIM 10
. The method in claim 7 , wherein said evaluating comprises determining a time of a last access (data group) to establish how current relations are between said different users .

US8190610B2
CLAIM 21
. The computer system of claim 17 , wherein : the at least one processor and memory being configured to process the intermediate data for each data group (last access) in a manner that is defined to correspond to that data group includes , for each data group , the at least one processor and memory being configured to employ an iterator that corresponds to that data group .
US20040122803A1
CLAIM 10
. The method in claim 7 , wherein said evaluating comprises determining a time of a last access (data group) to establish how current relations are between said different users .

US8190610B2
CLAIM 22
. The computer system of claim 21 , wherein : the intermediate data includes a plurality of grouped sets of key/value pairs ;

the at least one processor and memory are further operable to reduce the intermediate data via a plurality of reducers ;

for at least one of the reducers , the iterator corresponding to a particular data group (last access) , for that reducer , is configured to operate according to a different key of a different schema than the iterator corresponding to another particular data group , for that reducer .
US20040122803A1
CLAIM 10
. The method in claim 7 , wherein said evaluating comprises determining a time of a last access (data group) to establish how current relations are between said different users .

US8190610B2
CLAIM 28
. The computer system of claim 27 , wherein : the at least one processor and memory being further operable to process the intermediate data for each data group (last access) in a manner that is defined to correspond to that data group includes , for each data group , the at least one processor and memory being further operable to employ an iterator that corresponds to that data group , wherein the iterator is configured to provide the associated metadata to the reducing .
US20040122803A1
CLAIM 10
. The method in claim 7 , wherein said evaluating comprises determining a time of a last access (data group) to establish how current relations are between said different users .

US8190610B2
CLAIM 33
. A map-reduce method of processing data from a plurality of groups having different schema over a computer system , the method comprising : for a first data set having a plurality of first key-value pairs , wherein such first data set belongs to a first data group (last access) and the first key-value pairs have a first schema , partitioning the first data set into a plurality of data partitions and providing each data partition to a selected one of a plurality of mapping functions that are each user-configurable to independently output a plurality of lists of values for each of a set of keys found in such map function' ;

s corresponding data partition to form a first intermediate data set having a first set of resulting key-value pairs ;

for a second data set having a plurality of second key-value pairs , wherein such second data set belongs to a second data group and the second key-value pairs have a second schema , partitioning the first data set into a plurality of data partitions and providing each data partition to a selected one of a plurality of mapping functions to form a second intermediate data set having a second set of resulting key-value pairs differing from the first set of resulting key-value pairs , wherein the first schema differs from the second schema , wherein the first and second schema and the first and second set of resulting key-value pairs have a key in common ;

and reducing the first and the second intermediate data set together so as to form an output data set , wherein the reducing is accomplished by iterating on at least one key from the first and second intermediate data set , and the output data set has a different schema than the first and second schema , wherein the mapping and reducing operations are performed by a distributed system .
US20040122803A1
CLAIM 10
. The method in claim 7 , wherein said evaluating comprises determining a time of a last access (data group) to establish how current relations are between said different users .

US8190610B2
CLAIM 40
. A computer system including a plurality of computing devices , the computer system configured to process data of a data set , wherein the data set comprises a plurality of data groups , the computer system comprises at least one processor and memory that are operable to perform the following operations : for a first data set having a plurality of first key-value pairs , wherein such first data set belongs to a first data group (last access) and the first key-value pairs have a first schema , partitioning the first data set into a plurality of data partitions and providing each data partition to a selected one of a plurality of mapping functions that are each user-configurable to independently output a plurality of lists of values for each of a set of keys found in such map function' ;

s corresponding data partition to form a first intermediate data set having a first set of resulting key-value pairs ;

for a second data set having a plurality of second key-value pairs , wherein such second data set belongs to a second data group and the second key-value pairs have a second schema , partitioning the first data set into a plurality of data partitions and providing each data partition to a selected one of a plurality of mapping functions to form a second intermediate data set having a second set of resulting key-value pairs differing from the first set of resulting key-value pairs , wherein the first schema differs from the second schema ;

and reducing the first and the second intermediate data set together so as to form an output data set , wherein the reducing is accomplished by iterating on at least one key from the first and second intermediate data set , and the output data set has a different schema than the first and second schema , wherein the first and second schema and the first and second set of resulting key-value pairs have a key in common .
US20040122803A1
CLAIM 10
. The method in claim 7 , wherein said evaluating comprises determining a time of a last access (data group) to establish how current relations are between said different users .




US8190610B2

Filed: 2006-10-05     Issued: 2012-05-29

MapReduce for distributed database processing

(Original Assignee) Yahoo Inc     (Current Assignee) R2 Solutions LLC ; Altaba Inc

Ali Dasdan, Hung-Chih Yang, Ruey-Lung Hsiao
JP2004178253A

Filed: 2002-11-27     Issued: 2004-06-24

記憶デバイス制御装置および記憶デバイス制御装置の制御方法

(Original Assignee) Hitachi Ltd; 株式会社日立製作所     

Kenji Ishii, Toshio Komaki, Eiichi Sato, 都士夫 小牧, 健治 石井, 栄一 里
US8190610B2
CLAIM 1
. A method of processing data of a data set over a distributed system , wherein the data set comprises a plurality of data groups , the method comprising : partitioning the data of each one of the data groups into a plurality of data partitions that each have a plurality of key-value pairs and providing each data partition to a selected one of a plurality of mapping functions that are each user-configurable to independently output a plurality of lists of values for each of a set of keys found in such map function' ;

s corresponding data partition to form corresponding intermediate data for that data group (えること) and identifiable to that data group , wherein the data of a first data group has a different schema than the data of a second data group and the data of the first data group is mapped differently than the data of the second data group so that different lists of values are output for the corresponding different intermediate data , wherein the different schema and corresponding different intermediate data have a key in common ;

and reducing the intermediate data for the data groups to at least one output data group , including processing the intermediate data for each data group in a manner that is defined to correspond to that data group , so as to result in a merging of the corresponding different intermediate data based on the key in common , wherein the mapping and reducing operations are performed by a distributed system .
JP2004178253A
CLAIM 1
記憶デバイスにより提供される記憶領域を情報処理装置から指定させるために仮想的に設定されるボリュームである仮想ボリュームと、記憶デバイスにより提供される記憶領域に対応づけて論理的に設定されるボリュームである論理ボリュームとの対応づけを記憶する手段と、 情報処理装置から送られてくるデータ入出力要求を受信して、そのデータ入出力要求に設定されている仮想ボリュームに対応する論理ボリュームを対象としてデータ入出力処理を行う手段と、 仮想ボリュームに要求される仕様を記憶する手段と、 論理ボリュームの仕様を記憶する手段と、 仮想ボリュームに要求される仕様と論理ボリュームの仕様とを比較することにより仮想ボリュームに対応づける論理ボリュームを選出する選出手段と、 選出した論理ボリュームを仮想ボリュームに対応づけて記憶する手段と、 を備えること (data group, first data group) を特徴とする記憶デバイス制御装置。

US8190610B2
CLAIM 2
. The method of claim 1 , wherein : the at least one output data group (えること) is a plurality of output data groups .
JP2004178253A
CLAIM 1
記憶デバイスにより提供される記憶領域を情報処理装置から指定させるために仮想的に設定されるボリュームである仮想ボリュームと、記憶デバイスにより提供される記憶領域に対応づけて論理的に設定されるボリュームである論理ボリュームとの対応づけを記憶する手段と、 情報処理装置から送られてくるデータ入出力要求を受信して、そのデータ入出力要求に設定されている仮想ボリュームに対応する論理ボリュームを対象としてデータ入出力処理を行う手段と、 仮想ボリュームに要求される仕様を記憶する手段と、 論理ボリュームの仕様を記憶する手段と、 仮想ボリュームに要求される仕様と論理ボリュームの仕様とを比較することにより仮想ボリュームに対応づける論理ボリュームを選出する選出手段と、 選出した論理ボリュームを仮想ボリュームに対応づけて記憶する手段と、 を備えること (data group, first data group) を特徴とする記憶デバイス制御装置。

US8190610B2
CLAIM 3
. The method of claim 1 , wherein : corresponding intermediate data for a data group (えること) being identifiable to the data group includes the corresponding intermediate data being stored such that metadata associated with a manner of storage is identifiable to the data group .
JP2004178253A
CLAIM 1
記憶デバイスにより提供される記憶領域を情報処理装置から指定させるために仮想的に設定されるボリュームである仮想ボリュームと、記憶デバイスにより提供される記憶領域に対応づけて論理的に設定されるボリュームである論理ボリュームとの対応づけを記憶する手段と、 情報処理装置から送られてくるデータ入出力要求を受信して、そのデータ入出力要求に設定されている仮想ボリュームに対応する論理ボリュームを対象としてデータ入出力処理を行う手段と、 仮想ボリュームに要求される仕様を記憶する手段と、 論理ボリュームの仕様を記憶する手段と、 仮想ボリュームに要求される仕様と論理ボリュームの仕様とを比較することにより仮想ボリュームに対応づける論理ボリュームを選出する選出手段と、 選出した論理ボリュームを仮想ボリュームに対応づけて記憶する手段と、 を備えること (data group, first data group) を特徴とする記憶デバイス制御装置。

US8190610B2
CLAIM 4
. The method of claim 1 , wherein : corresponding intermediate data for a data group (えること) being identifiable to the data group includes the corresponding intermediate data being stored in association with data that is identifiable to the data group .
JP2004178253A
CLAIM 1
記憶デバイスにより提供される記憶領域を情報処理装置から指定させるために仮想的に設定されるボリュームである仮想ボリュームと、記憶デバイスにより提供される記憶領域に対応づけて論理的に設定されるボリュームである論理ボリュームとの対応づけを記憶する手段と、 情報処理装置から送られてくるデータ入出力要求を受信して、そのデータ入出力要求に設定されている仮想ボリュームに対応する論理ボリュームを対象としてデータ入出力処理を行う手段と、 仮想ボリュームに要求される仕様を記憶する手段と、 論理ボリュームの仕様を記憶する手段と、 仮想ボリュームに要求される仕様と論理ボリュームの仕様とを比較することにより仮想ボリュームに対応づける論理ボリュームを選出する選出手段と、 選出した論理ボリュームを仮想ボリュームに対応づけて記憶する手段と、 を備えること (data group, first data group) を特徴とする記憶デバイス制御装置。

US8190610B2
CLAIM 5
. The method of claim 1 , wherein : processing the intermediate data for each data group (えること) in a manner that is defined to correspond to that data group includes , for each data group , employing an iterator that corresponds to that data group .
JP2004178253A
CLAIM 1
記憶デバイスにより提供される記憶領域を情報処理装置から指定させるために仮想的に設定されるボリュームである仮想ボリュームと、記憶デバイスにより提供される記憶領域に対応づけて論理的に設定されるボリュームである論理ボリュームとの対応づけを記憶する手段と、 情報処理装置から送られてくるデータ入出力要求を受信して、そのデータ入出力要求に設定されている仮想ボリュームに対応する論理ボリュームを対象としてデータ入出力処理を行う手段と、 仮想ボリュームに要求される仕様を記憶する手段と、 論理ボリュームの仕様を記憶する手段と、 仮想ボリュームに要求される仕様と論理ボリュームの仕様とを比較することにより仮想ボリュームに対応づける論理ボリュームを選出する選出手段と、 選出した論理ボリュームを仮想ボリュームに対応づけて記憶する手段と、 を備えること (data group, first data group) を特徴とする記憶デバイス制御装置。

US8190610B2
CLAIM 9
. The method of claim 7 , wherein : at least some of the reducers include a sort , group-by-key and combine task ;

the method further comprises generating and providing metadata (値以下) for at least some of the mapping , partitioning , combining , grouping and sorting .
JP2004178253A
CLAIM 6
請求項1に記載の記憶デバイス制御装置において、 前記仮想ボリュームに対応づけている前記論理ボリュームにより提供される記憶容量の残容量を監視する手段を備え、 前記仮想ボリュームに既に対応づけている前記論理ボリュームにより提供される記憶容量の残容量が閾値以下 (providing metadata) となった場合に、前記仮想ボリュームに新たな前記論理ボリュームを対応づけて記憶する手段、 を備えることを特徴とする記憶デバイス制御装置。

US8190610B2
CLAIM 11
. The method of claim 10 , wherein : processing the intermediate data for each data group (えること) in a manner that is defined to correspond to that data group includes , for each data group , employing an iterator that corresponds to that data group , wherein the iterator includes providing the associated metadata to the processing of the reducing step .
JP2004178253A
CLAIM 1
記憶デバイスにより提供される記憶領域を情報処理装置から指定させるために仮想的に設定されるボリュームである仮想ボリュームと、記憶デバイスにより提供される記憶領域に対応づけて論理的に設定されるボリュームである論理ボリュームとの対応づけを記憶する手段と、 情報処理装置から送られてくるデータ入出力要求を受信して、そのデータ入出力要求に設定されている仮想ボリュームに対応する論理ボリュームを対象としてデータ入出力処理を行う手段と、 仮想ボリュームに要求される仕様を記憶する手段と、 論理ボリュームの仕様を記憶する手段と、 仮想ボリュームに要求される仕様と論理ボリュームの仕様とを比較することにより仮想ボリュームに対応づける論理ボリュームを選出する選出手段と、 選出した論理ボリュームを仮想ボリュームに対応づけて記憶する手段と、 を備えること (data group, first data group) を特徴とする記憶デバイス制御装置。

US8190610B2
CLAIM 12
. The method of claim 5 , wherein : the intermediate data includes a plurality of grouped sets of key/value pairs ;

the reducing step is carried out by a plurality of reducers ;

for at least one of the reducers , the iterator corresponding to a particular data group (えること) , for that reducer , operates according to a different key of a different schema than the iterator corresponding to another particular data group , for that reducer .
JP2004178253A
CLAIM 1
記憶デバイスにより提供される記憶領域を情報処理装置から指定させるために仮想的に設定されるボリュームである仮想ボリュームと、記憶デバイスにより提供される記憶領域に対応づけて論理的に設定されるボリュームである論理ボリュームとの対応づけを記憶する手段と、 情報処理装置から送られてくるデータ入出力要求を受信して、そのデータ入出力要求に設定されている仮想ボリュームに対応する論理ボリュームを対象としてデータ入出力処理を行う手段と、 仮想ボリュームに要求される仕様を記憶する手段と、 論理ボリュームの仕様を記憶する手段と、 仮想ボリュームに要求される仕様と論理ボリュームの仕様とを比較することにより仮想ボリュームに対応づける論理ボリュームを選出する選出手段と、 選出した論理ボリュームを仮想ボリュームに対応づけて記憶する手段と、 を備えること (data group, first data group) を特徴とする記憶デバイス制御装置。

US8190610B2
CLAIM 17
. A computer system including a plurality of computing devices , the computer system configured to process data of a data set , wherein the data set comprises a plurality of data groups , the computer system comprises at least one processor and memory that are operable to perform the following operations : partitioning the data of each one of the data groups into a plurality of data partitions that each have a plurality of key-value pairs and providing each data partition to a selected one of a plurality of mapping functions that are each user-configurable to independently output a plurality of lists of values for each of a set of keys found in such map function' ;

s corresponding data partition to form corresponding intermediate data for that data group (えること) and identifiable to that data group , wherein the data of a first data group has a different schema than the data of a second data group and the data of the first data group is mapped differently than the data of the second data group so that different lists of values are output for the corresponding different intermediate data , wherein the different schema and corresponding different intermediate data have a key in common ;

and reduce the intermediate data for the data groups to at least one output data group , including processing the intermediate data for each data group in a manner that is defined to correspond to that data group so as to result in a merging of the corresponding different intermediate data based on the key in common .
JP2004178253A
CLAIM 1
記憶デバイスにより提供される記憶領域を情報処理装置から指定させるために仮想的に設定されるボリュームである仮想ボリュームと、記憶デバイスにより提供される記憶領域に対応づけて論理的に設定されるボリュームである論理ボリュームとの対応づけを記憶する手段と、 情報処理装置から送られてくるデータ入出力要求を受信して、そのデータ入出力要求に設定されている仮想ボリュームに対応する論理ボリュームを対象としてデータ入出力処理を行う手段と、 仮想ボリュームに要求される仕様を記憶する手段と、 論理ボリュームの仕様を記憶する手段と、 仮想ボリュームに要求される仕様と論理ボリュームの仕様とを比較することにより仮想ボリュームに対応づける論理ボリュームを選出する選出手段と、 選出した論理ボリュームを仮想ボリュームに対応づけて記憶する手段と、 を備えること (data group, first data group) を特徴とする記憶デバイス制御装置。

US8190610B2
CLAIM 18
. The computer system of claim 17 , wherein : the at least one output data group (えること) is a plurality of output data groups .
JP2004178253A
CLAIM 1
記憶デバイスにより提供される記憶領域を情報処理装置から指定させるために仮想的に設定されるボリュームである仮想ボリュームと、記憶デバイスにより提供される記憶領域に対応づけて論理的に設定されるボリュームである論理ボリュームとの対応づけを記憶する手段と、 情報処理装置から送られてくるデータ入出力要求を受信して、そのデータ入出力要求に設定されている仮想ボリュームに対応する論理ボリュームを対象としてデータ入出力処理を行う手段と、 仮想ボリュームに要求される仕様を記憶する手段と、 論理ボリュームの仕様を記憶する手段と、 仮想ボリュームに要求される仕様と論理ボリュームの仕様とを比較することにより仮想ボリュームに対応づける論理ボリュームを選出する選出手段と、 選出した論理ボリュームを仮想ボリュームに対応づけて記憶する手段と、 を備えること (data group, first data group) を特徴とする記憶デバイス制御装置。

US8190610B2
CLAIM 19
. The computer system of claim 17 , wherein : corresponding intermediate data for a data group (えること) being identifiable to the data group includes the corresponding intermediate data being stored such that metadata associated with the manner of storage is identifiable to the data group .
JP2004178253A
CLAIM 1
記憶デバイスにより提供される記憶領域を情報処理装置から指定させるために仮想的に設定されるボリュームである仮想ボリュームと、記憶デバイスにより提供される記憶領域に対応づけて論理的に設定されるボリュームである論理ボリュームとの対応づけを記憶する手段と、 情報処理装置から送られてくるデータ入出力要求を受信して、そのデータ入出力要求に設定されている仮想ボリュームに対応する論理ボリュームを対象としてデータ入出力処理を行う手段と、 仮想ボリュームに要求される仕様を記憶する手段と、 論理ボリュームの仕様を記憶する手段と、 仮想ボリュームに要求される仕様と論理ボリュームの仕様とを比較することにより仮想ボリュームに対応づける論理ボリュームを選出する選出手段と、 選出した論理ボリュームを仮想ボリュームに対応づけて記憶する手段と、 を備えること (data group, first data group) を特徴とする記憶デバイス制御装置。

US8190610B2
CLAIM 20
. The computer system of claim 19 , wherein : corresponding intermediate data for a data group (えること) being identifiable to the data group includes the computer system being configured to store the corresponding intermediate data in association with data that is identifiable to the data group .
JP2004178253A
CLAIM 1
記憶デバイスにより提供される記憶領域を情報処理装置から指定させるために仮想的に設定されるボリュームである仮想ボリュームと、記憶デバイスにより提供される記憶領域に対応づけて論理的に設定されるボリュームである論理ボリュームとの対応づけを記憶する手段と、 情報処理装置から送られてくるデータ入出力要求を受信して、そのデータ入出力要求に設定されている仮想ボリュームに対応する論理ボリュームを対象としてデータ入出力処理を行う手段と、 仮想ボリュームに要求される仕様を記憶する手段と、 論理ボリュームの仕様を記憶する手段と、 仮想ボリュームに要求される仕様と論理ボリュームの仕様とを比較することにより仮想ボリュームに対応づける論理ボリュームを選出する選出手段と、 選出した論理ボリュームを仮想ボリュームに対応づけて記憶する手段と、 を備えること (data group, first data group) を特徴とする記憶デバイス制御装置。

US8190610B2
CLAIM 21
. The computer system of claim 17 , wherein : the at least one processor and memory being configured to process the intermediate data for each data group (えること) in a manner that is defined to correspond to that data group includes , for each data group , the at least one processor and memory being configured to employ an iterator that corresponds to that data group .
JP2004178253A
CLAIM 1
記憶デバイスにより提供される記憶領域を情報処理装置から指定させるために仮想的に設定されるボリュームである仮想ボリュームと、記憶デバイスにより提供される記憶領域に対応づけて論理的に設定されるボリュームである論理ボリュームとの対応づけを記憶する手段と、 情報処理装置から送られてくるデータ入出力要求を受信して、そのデータ入出力要求に設定されている仮想ボリュームに対応する論理ボリュームを対象としてデータ入出力処理を行う手段と、 仮想ボリュームに要求される仕様を記憶する手段と、 論理ボリュームの仕様を記憶する手段と、 仮想ボリュームに要求される仕様と論理ボリュームの仕様とを比較することにより仮想ボリュームに対応づける論理ボリュームを選出する選出手段と、 選出した論理ボリュームを仮想ボリュームに対応づけて記憶する手段と、 を備えること (data group, first data group) を特徴とする記憶デバイス制御装置。

US8190610B2
CLAIM 22
. The computer system of claim 21 , wherein : the intermediate data includes a plurality of grouped sets of key/value pairs ;

the at least one processor and memory are further operable to reduce the intermediate data via a plurality of reducers ;

for at least one of the reducers , the iterator corresponding to a particular data group (えること) , for that reducer , is configured to operate according to a different key of a different schema than the iterator corresponding to another particular data group , for that reducer .
JP2004178253A
CLAIM 1
記憶デバイスにより提供される記憶領域を情報処理装置から指定させるために仮想的に設定されるボリュームである仮想ボリュームと、記憶デバイスにより提供される記憶領域に対応づけて論理的に設定されるボリュームである論理ボリュームとの対応づけを記憶する手段と、 情報処理装置から送られてくるデータ入出力要求を受信して、そのデータ入出力要求に設定されている仮想ボリュームに対応する論理ボリュームを対象としてデータ入出力処理を行う手段と、 仮想ボリュームに要求される仕様を記憶する手段と、 論理ボリュームの仕様を記憶する手段と、 仮想ボリュームに要求される仕様と論理ボリュームの仕様とを比較することにより仮想ボリュームに対応づける論理ボリュームを選出する選出手段と、 選出した論理ボリュームを仮想ボリュームに対応づけて記憶する手段と、 を備えること (data group, first data group) を特徴とする記憶デバイス制御装置。

US8190610B2
CLAIM 28
. The computer system of claim 27 , wherein : the at least one processor and memory being further operable to process the intermediate data for each data group (えること) in a manner that is defined to correspond to that data group includes , for each data group , the at least one processor and memory being further operable to employ an iterator that corresponds to that data group , wherein the iterator is configured to provide the associated metadata to the reducing .
JP2004178253A
CLAIM 1
記憶デバイスにより提供される記憶領域を情報処理装置から指定させるために仮想的に設定されるボリュームである仮想ボリュームと、記憶デバイスにより提供される記憶領域に対応づけて論理的に設定されるボリュームである論理ボリュームとの対応づけを記憶する手段と、 情報処理装置から送られてくるデータ入出力要求を受信して、そのデータ入出力要求に設定されている仮想ボリュームに対応する論理ボリュームを対象としてデータ入出力処理を行う手段と、 仮想ボリュームに要求される仕様を記憶する手段と、 論理ボリュームの仕様を記憶する手段と、 仮想ボリュームに要求される仕様と論理ボリュームの仕様とを比較することにより仮想ボリュームに対応づける論理ボリュームを選出する選出手段と、 選出した論理ボリュームを仮想ボリュームに対応づけて記憶する手段と、 を備えること (data group, first data group) を特徴とする記憶デバイス制御装置。

US8190610B2
CLAIM 33
. A map-reduce method of processing data from a plurality of groups having different schema over a computer system , the method comprising : for a first data set having a plurality of first key-value pairs , wherein such first data set belongs to a first data group (えること) and the first key-value pairs have a first schema , partitioning the first data set into a plurality of data partitions and providing each data partition to a selected one of a plurality of mapping functions that are each user-configurable to independently output a plurality of lists of values for each of a set of keys found in such map function' ;

s corresponding data partition to form a first intermediate data set having a first set of resulting key-value pairs ;

for a second data set having a plurality of second key-value pairs , wherein such second data set belongs to a second data group and the second key-value pairs have a second schema , partitioning the first data set into a plurality of data partitions and providing each data partition to a selected one of a plurality of mapping functions to form a second intermediate data set having a second set of resulting key-value pairs differing from the first set of resulting key-value pairs , wherein the first schema differs from the second schema , wherein the first and second schema and the first and second set of resulting key-value pairs have a key in common ;

and reducing the first and the second intermediate data set together so as to form an output data set , wherein the reducing is accomplished by iterating on at least one key from the first and second intermediate data set , and the output data set has a different schema than the first and second schema , wherein the mapping and reducing operations are performed by a distributed system .
JP2004178253A
CLAIM 1
記憶デバイスにより提供される記憶領域を情報処理装置から指定させるために仮想的に設定されるボリュームである仮想ボリュームと、記憶デバイスにより提供される記憶領域に対応づけて論理的に設定されるボリュームである論理ボリュームとの対応づけを記憶する手段と、 情報処理装置から送られてくるデータ入出力要求を受信して、そのデータ入出力要求に設定されている仮想ボリュームに対応する論理ボリュームを対象としてデータ入出力処理を行う手段と、 仮想ボリュームに要求される仕様を記憶する手段と、 論理ボリュームの仕様を記憶する手段と、 仮想ボリュームに要求される仕様と論理ボリュームの仕様とを比較することにより仮想ボリュームに対応づける論理ボリュームを選出する選出手段と、 選出した論理ボリュームを仮想ボリュームに対応づけて記憶する手段と、 を備えること (data group, first data group) を特徴とする記憶デバイス制御装置。

US8190610B2
CLAIM 37
. The map-reduce method of claim 35 , wherein at least some of the reducers include a sort , group-by-key and combine task , the method further comprises generating and providing metadata (値以下) for at least some of the mapping , partitioning , combining , grouping and sorting .
JP2004178253A
CLAIM 6
請求項1に記載の記憶デバイス制御装置において、 前記仮想ボリュームに対応づけている前記論理ボリュームにより提供される記憶容量の残容量を監視する手段を備え、 前記仮想ボリュームに既に対応づけている前記論理ボリュームにより提供される記憶容量の残容量が閾値以下 (providing metadata) となった場合に、前記仮想ボリュームに新たな前記論理ボリュームを対応づけて記憶する手段、 を備えることを特徴とする記憶デバイス制御装置。

US8190610B2
CLAIM 40
. A computer system including a plurality of computing devices , the computer system configured to process data of a data set , wherein the data set comprises a plurality of data groups , the computer system comprises at least one processor and memory that are operable to perform the following operations : for a first data set having a plurality of first key-value pairs , wherein such first data set belongs to a first data group (えること) and the first key-value pairs have a first schema , partitioning the first data set into a plurality of data partitions and providing each data partition to a selected one of a plurality of mapping functions that are each user-configurable to independently output a plurality of lists of values for each of a set of keys found in such map function' ;

s corresponding data partition to form a first intermediate data set having a first set of resulting key-value pairs ;

for a second data set having a plurality of second key-value pairs , wherein such second data set belongs to a second data group and the second key-value pairs have a second schema , partitioning the first data set into a plurality of data partitions and providing each data partition to a selected one of a plurality of mapping functions to form a second intermediate data set having a second set of resulting key-value pairs differing from the first set of resulting key-value pairs , wherein the first schema differs from the second schema ;

and reducing the first and the second intermediate data set together so as to form an output data set , wherein the reducing is accomplished by iterating on at least one key from the first and second intermediate data set , and the output data set has a different schema than the first and second schema , wherein the first and second schema and the first and second set of resulting key-value pairs have a key in common .
JP2004178253A
CLAIM 1
記憶デバイスにより提供される記憶領域を情報処理装置から指定させるために仮想的に設定されるボリュームである仮想ボリュームと、記憶デバイスにより提供される記憶領域に対応づけて論理的に設定されるボリュームである論理ボリュームとの対応づけを記憶する手段と、 情報処理装置から送られてくるデータ入出力要求を受信して、そのデータ入出力要求に設定されている仮想ボリュームに対応する論理ボリュームを対象としてデータ入出力処理を行う手段と、 仮想ボリュームに要求される仕様を記憶する手段と、 論理ボリュームの仕様を記憶する手段と、 仮想ボリュームに要求される仕様と論理ボリュームの仕様とを比較することにより仮想ボリュームに対応づける論理ボリュームを選出する選出手段と、 選出した論理ボリュームを仮想ボリュームに対応づけて記憶する手段と、 を備えること (data group, first data group) を特徴とする記憶デバイス制御装置。

US8190610B2
CLAIM 44
. The computer system of claim 42 , wherein at least some of the reducers include a sort , group-by-key and combine task , the d the at least one processor and memory are further operable for : generating and providing metadata (値以下) for at least some of the mapping , partitioning , combining , grouping and sorting .
JP2004178253A
CLAIM 6
請求項1に記載の記憶デバイス制御装置において、 前記仮想ボリュームに対応づけている前記論理ボリュームにより提供される記憶容量の残容量を監視する手段を備え、 前記仮想ボリュームに既に対応づけている前記論理ボリュームにより提供される記憶容量の残容量が閾値以下 (providing metadata) となった場合に、前記仮想ボリュームに新たな前記論理ボリュームを対応づけて記憶する手段、 を備えることを特徴とする記憶デバイス制御装置。




US8190610B2

Filed: 2006-10-05     Issued: 2012-05-29

MapReduce for distributed database processing

(Original Assignee) Yahoo Inc     (Current Assignee) R2 Solutions LLC ; Altaba Inc

Ali Dasdan, Hung-Chih Yang, Ruey-Lung Hsiao
US20030149934A1

Filed: 2002-11-04     Issued: 2003-08-07

Computer program connecting the structure of a xml document to its underlying meaning

(Original Assignee) CHARTERIS PLC     (Current Assignee) CHARTERIS PLC

Robert Worden
US8190610B2
CLAIM 1
. A method of processing data of a data set over a distributed system , wherein the data set comprises a plurality of data groups , the method comprising : partitioning the data of each one of the data groups into a plurality of data partitions that each have a plurality of key-value pairs and providing each data partition to a selected one of a plurality of mapping functions (same function) that are each user-configurable to independently output a plurality of lists of values for each of a set of keys found in such map function' ;

s corresponding data partition to form corresponding intermediate data for that data group and identifiable to that data group , wherein the data of a first data (time t) group has a different schema than the data of a second data group and the data of the first data group is mapped differently than the data of the second data group so that different lists of values are output for the corresponding different intermediate data , wherein the different schema and corresponding different intermediate data have a key in common ;

and reducing the intermediate data for the data groups to at least one output data group , including processing the intermediate data for each data group in a manner that is defined to correspond to that data group , so as to result in a merging of the corresponding different intermediate data based on the key in common , wherein the mapping and reducing operations are performed by a distributed system .
US20030149934A1
CLAIM 2
. The computer program of claim 1 which achieves some functionality using XML , in which the same function (mapping functions) ality can be achieved with different XML based languages by using a set of mappings appropriate to each language .

US20030149934A1
CLAIM 14
. The method of claim 13 adapted to allow runtime t (first data, first data group) ranslations , allowing the choice of the input and output XML languages to be made dynamically by the use of the appropriate mappings

US8190610B2
CLAIM 6
. The method of claim 1 , wherein : the intermediate data includes a plurality of grouped sets of key/value pairs ;

the reducing step (high level) is carried out by a plurality of reducers ;

the method further comprises : partitioning the intermediate data into a plurality of partitions , each key/value pair of the intermediate data being provided to a separate one of the partitions ;

and providing the intermediate data of each partition to a separate one of the reducers .
US20030149934A1
CLAIM 9
. The computer program of claim 1 as used in an interface layer providing an API which insulates code written in a high level (reducing step) language which accesses or creates documents in XML based languages from the structure of those XML based languages .

US8190610B2
CLAIM 7
. The method of claim 1 , wherein : the intermediate data includes a plurality of grouped sets of key/value pairs ;

the reducing step (high level) is carried out by a plurality of reducers ;

the method further comprises : partitioning the intermediate data into a plurality of partitions , at least some of the key/value pairs of the intermediate data being provided to more than one of the partitions ;

and providing the intermediate data of each partition to a separate one of the reducers .
US20030149934A1
CLAIM 9
. The computer program of claim 1 as used in an interface layer providing an API which insulates code written in a high level (reducing step) language which accesses or creates documents in XML based languages from the structure of those XML based languages .

US8190610B2
CLAIM 10
. The method of claim 9 , wherein : the reducing step (high level) includes processing the metadata .
US20030149934A1
CLAIM 9
. The computer program of claim 1 as used in an interface layer providing an API which insulates code written in a high level (reducing step) language which accesses or creates documents in XML based languages from the structure of those XML based languages .

US8190610B2
CLAIM 11
. The method of claim 10 , wherein : processing the intermediate data for each data group in a manner that is defined to correspond to that data group includes , for each data group , employing an iterator that corresponds to that data group , wherein the iterator includes providing the associated metadata to the processing of the reducing step (high level) .
US20030149934A1
CLAIM 9
. The computer program of claim 1 as used in an interface layer providing an API which insulates code written in a high level (reducing step) language which accesses or creates documents in XML based languages from the structure of those XML based languages .

US8190610B2
CLAIM 12
. The method of claim 5 , wherein : the intermediate data includes a plurality of grouped sets of key/value pairs ;

the reducing step (high level) is carried out by a plurality of reducers ;

for at least one of the reducers , the iterator corresponding to a particular data group , for that reducer , operates according to a different key of a different schema than the iterator corresponding to another particular data group , for that reducer .
US20030149934A1
CLAIM 9
. The computer program of claim 1 as used in an interface layer providing an API which insulates code written in a high level (reducing step) language which accesses or creates documents in XML based languages from the structure of those XML based languages .

US8190610B2
CLAIM 13
. The method of claim 1 , wherein : the intermediate data processing step of the reducing step (high level) further comprises processing data that is not intermediate data .
US20030149934A1
CLAIM 9
. The computer program of claim 1 as used in an interface layer providing an API which insulates code written in a high level (reducing step) language which accesses or creates documents in XML based languages from the structure of those XML based languages .

US8190610B2
CLAIM 14
. The method of claim 13 , wherein : the reducing step (high level) is carried out by a plurality of reducers ;

and the data that is not intermediate data , for a particular reducer , includes data that is associated with another reducer .
US20030149934A1
CLAIM 9
. The computer program of claim 1 as used in an interface layer providing an API which insulates code written in a high level (reducing step) language which accesses or creates documents in XML based languages from the structure of those XML based languages .

US8190610B2
CLAIM 15
. The method of claim 13 , wherein : the reducing step (high level) is carried out by a plurality of reducers ;

and the data that is not intermediate data , for a particular reducer , includes data that is associated with that reducer .
US20030149934A1
CLAIM 9
. The computer program of claim 1 as used in an interface layer providing an API which insulates code written in a high level (reducing step) language which accesses or creates documents in XML based languages from the structure of those XML based languages .

US8190610B2
CLAIM 16
. The method of claim 1 wherein the reducing step (high level) includes relating the data among the plurality of data groups .
US20030149934A1
CLAIM 9
. The computer program of claim 1 as used in an interface layer providing an API which insulates code written in a high level (reducing step) language which accesses or creates documents in XML based languages from the structure of those XML based languages .

US8190610B2
CLAIM 17
. A computer system including a plurality of computing devices , the computer system configured to process data of a data set , wherein the data set comprises a plurality of data groups , the computer system comprises at least one processor and memory that are operable to perform the following operations : partitioning the data of each one of the data groups into a plurality of data partitions that each have a plurality of key-value pairs and providing each data partition to a selected one of a plurality of mapping functions (same function) that are each user-configurable to independently output a plurality of lists of values for each of a set of keys found in such map function' ;

s corresponding data partition to form corresponding intermediate data for that data group and identifiable to that data group , wherein the data of a first data (time t) group has a different schema than the data of a second data group and the data of the first data group is mapped differently than the data of the second data group so that different lists of values are output for the corresponding different intermediate data , wherein the different schema and corresponding different intermediate data have a key in common ;

and reduce the intermediate data for the data groups to at least one output data group , including processing the intermediate data for each data group in a manner that is defined to correspond to that data group so as to result in a merging of the corresponding different intermediate data based on the key in common .
US20030149934A1
CLAIM 2
. The computer program of claim 1 which achieves some functionality using XML , in which the same function (mapping functions) ality can be achieved with different XML based languages by using a set of mappings appropriate to each language .

US20030149934A1
CLAIM 14
. The method of claim 13 adapted to allow runtime t (first data, first data group) ranslations , allowing the choice of the input and output XML languages to be made dynamically by the use of the appropriate mappings

US8190610B2
CLAIM 33
. A map-reduce method of processing data from a plurality of groups having different schema over a computer system , the method comprising : for a first data (time t) set having a plurality of first key-value pairs , wherein such first data set belongs to a first data group and the first key-value pairs have a first schema , partitioning the first data set into a plurality of data partitions and providing each data partition to a selected one of a plurality of mapping functions (same function) that are each user-configurable to independently output a plurality of lists of values for each of a set of keys found in such map function' ;

s corresponding data partition to form a first intermediate data set having a first set of resulting key-value pairs ;

for a second data set having a plurality of second key-value pairs , wherein such second data set belongs to a second data group and the second key-value pairs have a second schema , partitioning the first data set into a plurality of data partitions and providing each data partition to a selected one of a plurality of mapping functions to form a second intermediate data set having a second set (following steps) of resulting key-value pairs differing from the first set of resulting key-value pairs , wherein the first schema differs from the second schema , wherein the first and second schema and the first and second set of resulting key-value pairs have a key in common ;

and reducing the first and the second intermediate data set together so as to form an output data set , wherein the reducing is accomplished by iterating on at least one key from the first and second intermediate data set , and the output data set has a different schema than the first and second schema , wherein the mapping and reducing operations are performed by a distributed system .
US20030149934A1
CLAIM 2
. The computer program of claim 1 which achieves some functionality using XML , in which the same function (mapping functions) ality can be achieved with different XML based languages by using a set of mappings appropriate to each language .

US20030149934A1
CLAIM 14
. The method of claim 13 adapted to allow runtime t (first data, first data group) ranslations , allowing the choice of the input and output XML languages to be made dynamically by the use of the appropriate mappings

US20030149934A1
CLAIM 25
. A method of creating a XML-based language comprising the following steps (second set, reduce method) : (a) creating a business information model (b) defining requirements for an XML-based language in terms of classes , attributes and relations in the business information model that need to be represented in documents in the language (c) automatically generating a schema definition of the XML-based language which meets those requirements , applying automatically various choices as to how different pieces of business information in the requirement are to be represented in XML .

US8190610B2
CLAIM 39
. The map-reduce method of claim 38 , wherein iterating includes providing the associated metadata to the processing of the reducing step (high level) .
US20030149934A1
CLAIM 9
. The computer program of claim 1 as used in an interface layer providing an API which insulates code written in a high level (reducing step) language which accesses or creates documents in XML based languages from the structure of those XML based languages .

US8190610B2
CLAIM 40
. A computer system including a plurality of computing devices , the computer system configured to process data of a data set , wherein the data set comprises a plurality of data groups , the computer system comprises at least one processor and memory that are operable to perform the following operations : for a first data (time t) set having a plurality of first key-value pairs , wherein such first data set belongs to a first data group and the first key-value pairs have a first schema , partitioning the first data set into a plurality of data partitions and providing each data partition to a selected one of a plurality of mapping functions (same function) that are each user-configurable to independently output a plurality of lists of values for each of a set of keys found in such map function' ;

s corresponding data partition to form a first intermediate data set having a first set of resulting key-value pairs ;

for a second data set having a plurality of second key-value pairs , wherein such second data set belongs to a second data group and the second key-value pairs have a second schema , partitioning the first data set into a plurality of data partitions and providing each data partition to a selected one of a plurality of mapping functions to form a second intermediate data set having a second set (following steps) of resulting key-value pairs differing from the first set of resulting key-value pairs , wherein the first schema differs from the second schema ;

and reducing the first and the second intermediate data set together so as to form an output data set , wherein the reducing is accomplished by iterating on at least one key from the first and second intermediate data set , and the output data set has a different schema than the first and second schema , wherein the first and second schema and the first and second set of resulting key-value pairs have a key in common .
US20030149934A1
CLAIM 2
. The computer program of claim 1 which achieves some functionality using XML , in which the same function (mapping functions) ality can be achieved with different XML based languages by using a set of mappings appropriate to each language .

US20030149934A1
CLAIM 14
. The method of claim 13 adapted to allow runtime t (first data, first data group) ranslations , allowing the choice of the input and output XML languages to be made dynamically by the use of the appropriate mappings

US20030149934A1
CLAIM 25
. A method of creating a XML-based language comprising the following steps (second set, reduce method) : (a) creating a business information model (b) defining requirements for an XML-based language in terms of classes , attributes and relations in the business information model that need to be represented in documents in the language (c) automatically generating a schema definition of the XML-based language which meets those requirements , applying automatically various choices as to how different pieces of business information in the requirement are to be represented in XML .

US8190610B2
CLAIM 46
. The computer system of claim 45 , wherein iterating includes providing the associated metadata to the processing of the reducing step (high level) .
US20030149934A1
CLAIM 9
. The computer program of claim 1 as used in an interface layer providing an API which insulates code written in a high level (reducing step) language which accesses or creates documents in XML based languages from the structure of those XML based languages .




US8190610B2

Filed: 2006-10-05     Issued: 2012-05-29

MapReduce for distributed database processing

(Original Assignee) Yahoo Inc     (Current Assignee) R2 Solutions LLC ; Altaba Inc

Ali Dasdan, Hung-Chih Yang, Ruey-Lung Hsiao
US20040088147A1

Filed: 2002-10-31     Issued: 2004-05-06

Global data placement

(Original Assignee) Hewlett Packard Development Co LP     (Current Assignee) Valtrus Innovations Ltd ; Hewlett Packard Enterprise Development LP

Qian Wang, Arif Merchant, Nina Mishra, Mahesh Kallahalla, Ram Swaminathan
US8190610B2
CLAIM 1
. A method of processing data (bandwidth constraint, first edge) of a data set over a distributed system , wherein the data set comprises a plurality of data groups , the method comprising : partitioning the data of each one of the data groups into a plurality of data partitions that each have a plurality of key-value pairs and providing each data partition to a selected one of a plurality of mapping functions that are each user-configurable to independently output a plurality of lists of values for each of a set of keys found in such map function' ;

s corresponding data partition to form corresponding intermediate data for that data group and identifiable to that data group , wherein the data of a first data group has a different schema than the data of a second data group and the data of the first data group is mapped differently than the data of the second data group so that different lists of values are output for the corresponding different intermediate data , wherein the different schema and corresponding different intermediate data have a key in common ;

and reducing the intermediate data for the data groups to at least one output data group , including processing the intermediate data for each data group in a manner that is defined to correspond to that data group , so as to result in a merging of the corresponding different intermediate data based on the key in common , wherein the mapping and reducing operations are performed by a distributed system .
US20040088147A1
CLAIM 11
. The method of claim 7 , wherein one or more of the compute server nodes are split into at least a first , second and third levels defining two edges , the first edge (processing data, computing devices) being between first and second levels of nodes and the second edge being between the second and third levels of nodes , wherein one of the first and second edges includes a computation capacity constraint and the other of the first and second edges includes a memory capacity constraint .

US20040088147A1
CLAIM 25
. The method of claim 24 , wherein edge capacity constraints include : at least one of a computation capacity constraint a memory capacity constraint for an edge leading to a compute server node ;
and at least one of a storage capacity constraint a bandwidth constraint (processing data, computing devices) for an edge leading to a storage server node .

US8190610B2
CLAIM 12
. The method of claim 5 , wherein : the intermediate data includes a plurality of grouped sets of key/value pairs ;

the reducing step is carried out by a plurality of reducers ;

for at least one of the reducers , the iterator corresponding to a particular data group , for that reducer , operates according to a different key (upper limit) of a different schema than the iterator corresponding to another particular data group , for that reducer .
US20040088147A1
CLAIM 8
. The method of claim 7 , wherein : the flow conservation is defined as ∑ j  x ij k - ∑ j  x ji k = 0     where     ∑ j  x ij k  represents the sum of outgoing flow of a commodity k from node i and ∑ j  x ji k  represents the sum of incoming flow of the commodity k into node i ;
the edge constraint is defined as ∑ k  w ij k  x ij k ≤ U ij  represents a weight of the commodity k flowing through edge from node i to node j (or edge ij) , x ij k represents the amount flow of commodity k through the edge ij , and U ij , represents the upper limit (different key) on capacity of the edge ij ;
and individual commodity constraint is defined as l ij k ≦x ij k ≦u ij k where l ij k and u ij k represent lower and upper bounds on the capacity of the commodity k through the edge ij .

US8190610B2
CLAIM 13
. The method of claim 1 , wherein : the intermediate data processing step of the reducing step further comprises processing data (bandwidth constraint, first edge) that is not intermediate data .
US20040088147A1
CLAIM 11
. The method of claim 7 , wherein one or more of the compute server nodes are split into at least a first , second and third levels defining two edges , the first edge (processing data, computing devices) being between first and second levels of nodes and the second edge being between the second and third levels of nodes , wherein one of the first and second edges includes a computation capacity constraint and the other of the first and second edges includes a memory capacity constraint .

US20040088147A1
CLAIM 25
. The method of claim 24 , wherein edge capacity constraints include : at least one of a computation capacity constraint a memory capacity constraint for an edge leading to a compute server node ;
and at least one of a storage capacity constraint a bandwidth constraint (processing data, computing devices) for an edge leading to a storage server node .

US8190610B2
CLAIM 17
. A computer system including a plurality of computing devices (bandwidth constraint, first edge) , the computer system configured to process data of a data set , wherein the data set comprises a plurality of data groups , the computer system comprises at least one processor and memory that are operable to perform the following operations : partitioning the data of each one of the data groups into a plurality of data partitions that each have a plurality of key-value pairs and providing each data partition to a selected one of a plurality of mapping functions that are each user-configurable to independently output a plurality of lists of values for each of a set of keys found in such map function' ;

s corresponding data partition to form corresponding intermediate data for that data group and identifiable to that data group , wherein the data of a first data group has a different schema than the data of a second data group and the data of the first data group is mapped differently than the data of the second data group so that different lists of values are output for the corresponding different intermediate data , wherein the different schema and corresponding different intermediate data have a key in common ;

and reduce the intermediate data for the data groups to at least one output data group , including processing the intermediate data for each data group in a manner that is defined to correspond to that data group so as to result in a merging of the corresponding different intermediate data based on the key in common .
US20040088147A1
CLAIM 11
. The method of claim 7 , wherein one or more of the compute server nodes are split into at least a first , second and third levels defining two edges , the first edge (processing data, computing devices) being between first and second levels of nodes and the second edge being between the second and third levels of nodes , wherein one of the first and second edges includes a computation capacity constraint and the other of the first and second edges includes a memory capacity constraint .

US20040088147A1
CLAIM 25
. The method of claim 24 , wherein edge capacity constraints include : at least one of a computation capacity constraint a memory capacity constraint for an edge leading to a compute server node ;
and at least one of a storage capacity constraint a bandwidth constraint (processing data, computing devices) for an edge leading to a storage server node .

US8190610B2
CLAIM 22
. The computer system of claim 21 , wherein : the intermediate data includes a plurality of grouped sets of key/value pairs ;

the at least one processor and memory are further operable to reduce the intermediate data via a plurality of reducers ;

for at least one of the reducers , the iterator corresponding to a particular data group , for that reducer , is configured to operate according to a different key (upper limit) of a different schema than the iterator corresponding to another particular data group , for that reducer .
US20040088147A1
CLAIM 8
. The method of claim 7 , wherein : the flow conservation is defined as ∑ j  x ij k - ∑ j  x ji k = 0     where     ∑ j  x ij k  represents the sum of outgoing flow of a commodity k from node i and ∑ j  x ji k  represents the sum of incoming flow of the commodity k into node i ;
the edge constraint is defined as ∑ k  w ij k  x ij k ≤ U ij  represents a weight of the commodity k flowing through edge from node i to node j (or edge ij) , x ij k represents the amount flow of commodity k through the edge ij , and U ij , represents the upper limit (different key) on capacity of the edge ij ;
and individual commodity constraint is defined as l ij k ≦x ij k ≦u ij k where l ij k and u ij k represent lower and upper bounds on the capacity of the commodity k through the edge ij .

US8190610B2
CLAIM 29
. The computer system of claim 17 , wherein : the intermediate data processing of the reducing further comprises processing data (bandwidth constraint, first edge) that is not intermediate data .
US20040088147A1
CLAIM 11
. The method of claim 7 , wherein one or more of the compute server nodes are split into at least a first , second and third levels defining two edges , the first edge (processing data, computing devices) being between first and second levels of nodes and the second edge being between the second and third levels of nodes , wherein one of the first and second edges includes a computation capacity constraint and the other of the first and second edges includes a memory capacity constraint .

US20040088147A1
CLAIM 25
. The method of claim 24 , wherein edge capacity constraints include : at least one of a computation capacity constraint a memory capacity constraint for an edge leading to a compute server node ;
and at least one of a storage capacity constraint a bandwidth constraint (processing data, computing devices) for an edge leading to a storage server node .

US8190610B2
CLAIM 33
. A map-reduce method of processing data (bandwidth constraint, first edge) from a plurality of groups having different schema over a computer system , the method comprising : for a first data set having a plurality of first key-value pairs , wherein such first data set belongs to a first data group and the first key-value pairs have a first schema , partitioning the first data set into a plurality of data partitions and providing each data partition to a selected one of a plurality of mapping functions that are each user-configurable to independently output a plurality of lists of values for each of a set of keys found in such map function' ;

s corresponding data partition to form a first intermediate data set ( ∑) having a first set of resulting key-value pairs ;

for a second data set having a plurality of second key-value pairs , wherein such second data set belongs to a second data group and the second key-value pairs have a second schema , partitioning the first data set into a plurality of data partitions and providing each data partition to a selected one of a plurality of mapping functions to form a second intermediate data set having a second set (including one) of resulting key-value pairs differing from the first set of resulting key-value pairs , wherein the first schema differs from the second schema , wherein the first and second schema and the first and second set of resulting key-value pairs have a key in common ;

and reducing the first and the second intermediate data set together so as to form an output data set (n value) , wherein the reducing is accomplished by iterating on at least one key from the first and second intermediate data set , and the output data set has a different schema than the first and second schema , wherein the mapping and reducing operations are performed by a distributed system .
US20040088147A1
CLAIM 8
. The method of claim 7 , wherein : the flow conservation is defined as ∑ j  x ij k - ∑ j  x ji k = 0     where     ∑ (first intermediate data set) j  x ij k  represents the sum of outgoing flow of a commodity k from node i and ∑ j  x ji k  represents the sum of incoming flow of the commodity k into node i ;
the edge constraint is defined as ∑ k  w ij k  x ij k ≤ U ij  represents a weight of the commodity k flowing through edge from node i to node j (or edge ij) , x ij k represents the amount flow of commodity k through the edge ij , and U ij , represents the upper limit on capacity of the edge ij ;
and individual commodity constraint is defined as l ij k ≦x ij k ≦u ij k where l ij k and u ij k represent lower and upper bounds on the capacity of the commodity k through the edge ij .

US20040088147A1
CLAIM 11
. The method of claim 7 , wherein one or more of the compute server nodes are split into at least a first , second and third levels defining two edges , the first edge (processing data, computing devices) being between first and second levels of nodes and the second edge being between the second and third levels of nodes , wherein one of the first and second edges includes a computation capacity constraint and the other of the first and second edges includes a memory capacity constraint .

US20040088147A1
CLAIM 17
. The method of claim 16 , further comprising at least one of : defining a binary store-cap flow associated with each store wherein value (output data set) s of the binary store-cap flow on an edge between the storage server node and store node is 0/1 to indicate whether or not a replica for a particular store may be stored on the storage server ;
and defining a binary workload-cap flow for a workload wherein values of the binary store-cap flow is 0/1 to indicate whether or not the workload is assigned to a particular compute server .

US20040088147A1
CLAIM 20
. A method of allocating resources in a network , comprising : modeling a source and a sink for each data stream of the network ;
modeling intermediate nodes including one (second set) or more workload nodes , one or more compute server nodes , and one or more storage server nodes such at each workload node is connected to only one of the one or more one or more compute server nodes and such that each compute server node is connected to at least one of the one or more storage server nodes ;
connecting the source for each data stream to at least one of the one or more workload nodes ;
and connecting the source for each data stream to at least one of the one or more storage server nodes .

US20040088147A1
CLAIM 25
. The method of claim 24 , wherein edge capacity constraints include : at least one of a computation capacity constraint a memory capacity constraint for an edge leading to a compute server node ;
and at least one of a storage capacity constraint a bandwidth constraint (processing data, computing devices) for an edge leading to a storage server node .

US8190610B2
CLAIM 34
. The map-reduce method of claim 33 , wherein the reducing is accomplished by iterating on at least one key from each of the first and second intermediate data set so that the output data set (n value) is a merging of a portion of the first and second intermediate data set .
US20040088147A1
CLAIM 17
. The method of claim 16 , further comprising at least one of : defining a binary store-cap flow associated with each store wherein value (output data set) s of the binary store-cap flow on an edge between the storage server node and store node is 0/1 to indicate whether or not a replica for a particular store may be stored on the storage server ;
and defining a binary workload-cap flow for a workload wherein values of the binary store-cap flow is 0/1 to indicate whether or not the workload is assigned to a particular compute server .

US8190610B2
CLAIM 40
. A computer system including a plurality of computing devices (bandwidth constraint, first edge) , the computer system configured to process data of a data set , wherein the data set comprises a plurality of data groups , the computer system comprises at least one processor and memory that are operable to perform the following operations : for a first data set having a plurality of first key-value pairs , wherein such first data set belongs to a first data group and the first key-value pairs have a first schema , partitioning the first data set into a plurality of data partitions and providing each data partition to a selected one of a plurality of mapping functions that are each user-configurable to independently output a plurality of lists of values for each of a set of keys found in such map function' ;

s corresponding data partition to form a first intermediate data set ( ∑) having a first set of resulting key-value pairs ;

for a second data set having a plurality of second key-value pairs , wherein such second data set belongs to a second data group and the second key-value pairs have a second schema , partitioning the first data set into a plurality of data partitions and providing each data partition to a selected one of a plurality of mapping functions to form a second intermediate data set having a second set (including one) of resulting key-value pairs differing from the first set of resulting key-value pairs , wherein the first schema differs from the second schema ;

and reducing the first and the second intermediate data set together so as to form an output data set (n value) , wherein the reducing is accomplished by iterating on at least one key from the first and second intermediate data set , and the output data set has a different schema than the first and second schema , wherein the first and second schema and the first and second set of resulting key-value pairs have a key in common .
US20040088147A1
CLAIM 8
. The method of claim 7 , wherein : the flow conservation is defined as ∑ j  x ij k - ∑ j  x ji k = 0     where     ∑ (first intermediate data set) j  x ij k  represents the sum of outgoing flow of a commodity k from node i and ∑ j  x ji k  represents the sum of incoming flow of the commodity k into node i ;
the edge constraint is defined as ∑ k  w ij k  x ij k ≤ U ij  represents a weight of the commodity k flowing through edge from node i to node j (or edge ij) , x ij k represents the amount flow of commodity k through the edge ij , and U ij , represents the upper limit on capacity of the edge ij ;
and individual commodity constraint is defined as l ij k ≦x ij k ≦u ij k where l ij k and u ij k represent lower and upper bounds on the capacity of the commodity k through the edge ij .

US20040088147A1
CLAIM 11
. The method of claim 7 , wherein one or more of the compute server nodes are split into at least a first , second and third levels defining two edges , the first edge (processing data, computing devices) being between first and second levels of nodes and the second edge being between the second and third levels of nodes , wherein one of the first and second edges includes a computation capacity constraint and the other of the first and second edges includes a memory capacity constraint .

US20040088147A1
CLAIM 17
. The method of claim 16 , further comprising at least one of : defining a binary store-cap flow associated with each store wherein value (output data set) s of the binary store-cap flow on an edge between the storage server node and store node is 0/1 to indicate whether or not a replica for a particular store may be stored on the storage server ;
and defining a binary workload-cap flow for a workload wherein values of the binary store-cap flow is 0/1 to indicate whether or not the workload is assigned to a particular compute server .

US20040088147A1
CLAIM 20
. A method of allocating resources in a network , comprising : modeling a source and a sink for each data stream of the network ;
modeling intermediate nodes including one (second set) or more workload nodes , one or more compute server nodes , and one or more storage server nodes such at each workload node is connected to only one of the one or more one or more compute server nodes and such that each compute server node is connected to at least one of the one or more storage server nodes ;
connecting the source for each data stream to at least one of the one or more workload nodes ;
and connecting the source for each data stream to at least one of the one or more storage server nodes .

US20040088147A1
CLAIM 25
. The method of claim 24 , wherein edge capacity constraints include : at least one of a computation capacity constraint a memory capacity constraint for an edge leading to a compute server node ;
and at least one of a storage capacity constraint a bandwidth constraint (processing data, computing devices) for an edge leading to a storage server node .

US8190610B2
CLAIM 41
. The computer system of claim 40 , wherein the reducing is accomplished by iterating on at least one key from each of the first and second intermediate data set so that the output data set (n value) is a merging of a portion of the first and second intermediate data set .
US20040088147A1
CLAIM 17
. The method of claim 16 , further comprising at least one of : defining a binary store-cap flow associated with each store wherein value (output data set) s of the binary store-cap flow on an edge between the storage server node and store node is 0/1 to indicate whether or not a replica for a particular store may be stored on the storage server ;
and defining a binary workload-cap flow for a workload wherein values of the binary store-cap flow is 0/1 to indicate whether or not the workload is assigned to a particular compute server .




US8190610B2

Filed: 2006-10-05     Issued: 2012-05-29

MapReduce for distributed database processing

(Original Assignee) Yahoo Inc     (Current Assignee) R2 Solutions LLC ; Altaba Inc

Ali Dasdan, Hung-Chih Yang, Ruey-Lung Hsiao
US7047253B1

Filed: 2002-09-27     Issued: 2006-05-16

Mechanisms for storing content and properties of hierarchically organized resources

(Original Assignee) Oracle International Corp     (Current Assignee) Oracle International Corp

Ravi Murthy, Eric Sedlar, Nipun Agarwal, Neema Jalali
US8190610B2
CLAIM 1
. A method of processing data of a data set over a distributed system , wherein the data set comprises a plurality of data groups , the method comprising : partitioning the data of each one of the data groups into a plurality of data partitions that each have a plurality of key-value pairs and providing each data partition to a selected one of a plurality of mapping functions (have values) that are each user-configurable to independently output a plurality of lists of values for each of a set of keys found in such map function' ;

s corresponding data partition to form corresponding intermediate data for that data group and identifiable to that data group , wherein the data of a first data (first data) group has a different schema (receiving input) than the data of a second data group and the data of the first data group is mapped differently than the data of the second data group so that different lists of values are output for the corresponding different intermediate data , wherein the different schema and corresponding different intermediate data have a key in common ;

and reducing the intermediate data for the data groups to at least one output data group , including processing the intermediate data for each data group in a manner that is defined to correspond to that data group , so as to result in a merging of the corresponding different intermediate data based on the key in common , wherein the mapping and reducing operations are performed by a distributed system .
US7047253B1
CLAIM 10
. The method of claim 9 wherein : the one or more structures include a table that stores values for the metadata attributes associated with a particular resource ;
the one or more XML schemas include a particular XML schema that indicates metadata attributes that apply to said particular resource ;
the method further comprising the steps of : receiving input (different schema) that represents a change to said particular XML schema ;
and in response to said input , modifying the structure of said table .

US7047253B1
CLAIM 17
. The method of claim 9 wherein : the step of determining , based on one or more XML schemas , which metadata attributes to store for said resources includes determining , based on said one or more XML schemas , that said resources may have values (mapping functions) for metadata attributes that are not explicitly declared in said one or more XML schemas ;
the step of creating one or more structures , within said database , to store said metadata attributes based on said one or more XML schemas includes creating a table that includes a catch-all column for storing data that corresponds to metadata attributes that are not explicitly declared in said one or more schemas ;
and the step of storing , within said one or more structures , values for the metadata attributes associated with said resources includes storing , within said catch-all column , values for metadata attributes that are not explicitly declared in said one or more schemas .

US7047253B1
CLAIM 20
. The method of claim 9 further comprising the steps of : receiving data associated with a resource ;
identifying an XML schema that dictates the metadata attributes that apply to said resource ;
detecting that said resource includes first data (first data) associated with metadata attributes that are expressly identified in said XML schema ;
and second data associated with metadata attributes that are not expressly identified in said XML schema ;
storing the first data in columns that correspond to said metadata attributes that are expressly identified in said XML schema ;
and storing the second data in a catch-all column .

US8190610B2
CLAIM 9
. The method of claim 7 , wherein : at least some of the reducers include a sort , group-by-key and combine task ;

the method further comprises generating and providing metadata (stores metadata) for at least some of the mapping , partitioning , combining , grouping and sorting .
US7047253B1
CLAIM 5
. A computer-implemented method for managing data stored in a database system , the method comprising the steps of : storing information in content structures that are separate from hierarchy structures ;
storing metadata for a plurality of resources in said hierarchy structures , wherein said metadata includes : location data , associated with a given resource of said plurality of resources , that identifies which information in said content structures represents content of said given resource ;
and hierarchy data that indicates a position , within an information hierarchy , of each of said resources ;
and wherein : a first table that has a row corresponding to each resource in the information hierarchy , wherein each row stores metadata (providing metadata) about the resource to which the row corresponds ;
and a second table that identifies parent-child relationships of the resources that belong to said information hierarchy .

US8190610B2
CLAIM 12
. The method of claim 5 , wherein : the intermediate data includes a plurality of grouped sets of key/value pairs ;

the reducing step is carried out by a plurality of reducers ;

for at least one of the reducers , the iterator corresponding to a particular data group , for that reducer , operates according to a different key of a different schema (receiving input) than the iterator corresponding to another particular data group , for that reducer .
US7047253B1
CLAIM 10
. The method of claim 9 wherein : the one or more structures include a table that stores values for the metadata attributes associated with a particular resource ;
the one or more XML schemas include a particular XML schema that indicates metadata attributes that apply to said particular resource ;
the method further comprising the steps of : receiving input (different schema) that represents a change to said particular XML schema ;
and in response to said input , modifying the structure of said table .

US8190610B2
CLAIM 17
. A computer system including a plurality of computing devices , the computer system configured to process data of a data set , wherein the data set comprises a plurality of data groups , the computer system comprises at least one processor and memory that are operable to perform the following operations : partitioning the data of each one of the data groups into a plurality of data partitions that each have a plurality of key-value pairs and providing each data partition to a selected one of a plurality of mapping functions (have values) that are each user-configurable to independently output a plurality of lists of values for each of a set of keys found in such map function' ;

s corresponding data partition to form corresponding intermediate data for that data group and identifiable to that data group , wherein the data of a first data (first data) group has a different schema (receiving input) than the data of a second data group and the data of the first data group is mapped differently than the data of the second data group so that different lists of values are output for the corresponding different intermediate data , wherein the different schema and corresponding different intermediate data have a key in common ;

and reduce the intermediate data for the data groups to at least one output data group , including processing the intermediate data for each data group in a manner that is defined to correspond to that data group so as to result in a merging of the corresponding different intermediate data based on the key in common .
US7047253B1
CLAIM 10
. The method of claim 9 wherein : the one or more structures include a table that stores values for the metadata attributes associated with a particular resource ;
the one or more XML schemas include a particular XML schema that indicates metadata attributes that apply to said particular resource ;
the method further comprising the steps of : receiving input (different schema) that represents a change to said particular XML schema ;
and in response to said input , modifying the structure of said table .

US7047253B1
CLAIM 17
. The method of claim 9 wherein : the step of determining , based on one or more XML schemas , which metadata attributes to store for said resources includes determining , based on said one or more XML schemas , that said resources may have values (mapping functions) for metadata attributes that are not explicitly declared in said one or more XML schemas ;
the step of creating one or more structures , within said database , to store said metadata attributes based on said one or more XML schemas includes creating a table that includes a catch-all column for storing data that corresponds to metadata attributes that are not explicitly declared in said one or more schemas ;
and the step of storing , within said one or more structures , values for the metadata attributes associated with said resources includes storing , within said catch-all column , values for metadata attributes that are not explicitly declared in said one or more schemas .

US7047253B1
CLAIM 20
. The method of claim 9 further comprising the steps of : receiving data associated with a resource ;
identifying an XML schema that dictates the metadata attributes that apply to said resource ;
detecting that said resource includes first data (first data) associated with metadata attributes that are expressly identified in said XML schema ;
and second data associated with metadata attributes that are not expressly identified in said XML schema ;
storing the first data in columns that correspond to said metadata attributes that are expressly identified in said XML schema ;
and storing the second data in a catch-all column .

US8190610B2
CLAIM 22
. The computer system of claim 21 , wherein : the intermediate data includes a plurality of grouped sets of key/value pairs ;

the at least one processor and memory are further operable to reduce the intermediate data via a plurality of reducers ;

for at least one of the reducers , the iterator corresponding to a particular data group , for that reducer , is configured to operate according to a different key of a different schema (receiving input) than the iterator corresponding to another particular data group , for that reducer .
US7047253B1
CLAIM 10
. The method of claim 9 wherein : the one or more structures include a table that stores values for the metadata attributes associated with a particular resource ;
the one or more XML schemas include a particular XML schema that indicates metadata attributes that apply to said particular resource ;
the method further comprising the steps of : receiving input (different schema) that represents a change to said particular XML schema ;
and in response to said input , modifying the structure of said table .

US8190610B2
CLAIM 33
. A map-reduce method of processing data from a plurality of groups having different schema (receiving input) over a computer system , the method comprising : for a first data (first data) set having a plurality of first key-value pairs , wherein such first data set belongs to a first data group and the first key-value pairs have a first schema , partitioning the first data set into a plurality of data partitions and providing each data partition to a selected one of a plurality of mapping functions (have values) that are each user-configurable to independently output a plurality of lists of values for each of a set of keys found in such map function' ;

s corresponding data partition to form a first intermediate data set having a first set of resulting key-value pairs ;

for a second data set having a plurality of second key-value pairs , wherein such second data set belongs to a second data group and the second key-value pairs have a second schema , partitioning the first data set into a plurality of data partitions and providing each data partition to a selected one of a plurality of mapping functions to form a second intermediate data set having a second set of resulting key-value pairs differing from the first set of resulting key-value pairs , wherein the first schema differs from the second schema , wherein the first and second schema and the first and second set of resulting key-value pairs have a key in common ;

and reducing the first and the second intermediate data set together so as to form an output data set , wherein the reducing is accomplished by iterating on at least one key from the first and second intermediate data set , and the output data set has a different schema than the first and second schema , wherein the mapping and reducing operations are performed by a distributed system .
US7047253B1
CLAIM 10
. The method of claim 9 wherein : the one or more structures include a table that stores values for the metadata attributes associated with a particular resource ;
the one or more XML schemas include a particular XML schema that indicates metadata attributes that apply to said particular resource ;
the method further comprising the steps of : receiving input (different schema) that represents a change to said particular XML schema ;
and in response to said input , modifying the structure of said table .

US7047253B1
CLAIM 17
. The method of claim 9 wherein : the step of determining , based on one or more XML schemas , which metadata attributes to store for said resources includes determining , based on said one or more XML schemas , that said resources may have values (mapping functions) for metadata attributes that are not explicitly declared in said one or more XML schemas ;
the step of creating one or more structures , within said database , to store said metadata attributes based on said one or more XML schemas includes creating a table that includes a catch-all column for storing data that corresponds to metadata attributes that are not explicitly declared in said one or more schemas ;
and the step of storing , within said one or more structures , values for the metadata attributes associated with said resources includes storing , within said catch-all column , values for metadata attributes that are not explicitly declared in said one or more schemas .

US7047253B1
CLAIM 20
. The method of claim 9 further comprising the steps of : receiving data associated with a resource ;
identifying an XML schema that dictates the metadata attributes that apply to said resource ;
detecting that said resource includes first data (first data) associated with metadata attributes that are expressly identified in said XML schema ;
and second data associated with metadata attributes that are not expressly identified in said XML schema ;
storing the first data in columns that correspond to said metadata attributes that are expressly identified in said XML schema ;
and storing the second data in a catch-all column .

US8190610B2
CLAIM 37
. The map-reduce method of claim 35 , wherein at least some of the reducers include a sort , group-by-key and combine task , the method further comprises generating and providing metadata (stores metadata) for at least some of the mapping , partitioning , combining , grouping and sorting .
US7047253B1
CLAIM 5
. A computer-implemented method for managing data stored in a database system , the method comprising the steps of : storing information in content structures that are separate from hierarchy structures ;
storing metadata for a plurality of resources in said hierarchy structures , wherein said metadata includes : location data , associated with a given resource of said plurality of resources , that identifies which information in said content structures represents content of said given resource ;
and hierarchy data that indicates a position , within an information hierarchy , of each of said resources ;
and wherein : a first table that has a row corresponding to each resource in the information hierarchy , wherein each row stores metadata (providing metadata) about the resource to which the row corresponds ;
and a second table that identifies parent-child relationships of the resources that belong to said information hierarchy .

US8190610B2
CLAIM 40
. A computer system including a plurality of computing devices , the computer system configured to process data of a data set , wherein the data set comprises a plurality of data groups , the computer system comprises at least one processor and memory that are operable to perform the following operations : for a first data (first data) set having a plurality of first key-value pairs , wherein such first data set belongs to a first data group and the first key-value pairs have a first schema , partitioning the first data set into a plurality of data partitions and providing each data partition to a selected one of a plurality of mapping functions (have values) that are each user-configurable to independently output a plurality of lists of values for each of a set of keys found in such map function' ;

s corresponding data partition to form a first intermediate data set having a first set of resulting key-value pairs ;

for a second data set having a plurality of second key-value pairs , wherein such second data set belongs to a second data group and the second key-value pairs have a second schema , partitioning the first data set into a plurality of data partitions and providing each data partition to a selected one of a plurality of mapping functions to form a second intermediate data set having a second set of resulting key-value pairs differing from the first set of resulting key-value pairs , wherein the first schema differs from the second schema ;

and reducing the first and the second intermediate data set together so as to form an output data set , wherein the reducing is accomplished by iterating on at least one key from the first and second intermediate data set , and the output data set has a different schema (receiving input) than the first and second schema , wherein the first and second schema and the first and second set of resulting key-value pairs have a key in common .
US7047253B1
CLAIM 10
. The method of claim 9 wherein : the one or more structures include a table that stores values for the metadata attributes associated with a particular resource ;
the one or more XML schemas include a particular XML schema that indicates metadata attributes that apply to said particular resource ;
the method further comprising the steps of : receiving input (different schema) that represents a change to said particular XML schema ;
and in response to said input , modifying the structure of said table .

US7047253B1
CLAIM 17
. The method of claim 9 wherein : the step of determining , based on one or more XML schemas , which metadata attributes to store for said resources includes determining , based on said one or more XML schemas , that said resources may have values (mapping functions) for metadata attributes that are not explicitly declared in said one or more XML schemas ;
the step of creating one or more structures , within said database , to store said metadata attributes based on said one or more XML schemas includes creating a table that includes a catch-all column for storing data that corresponds to metadata attributes that are not explicitly declared in said one or more schemas ;
and the step of storing , within said one or more structures , values for the metadata attributes associated with said resources includes storing , within said catch-all column , values for metadata attributes that are not explicitly declared in said one or more schemas .

US7047253B1
CLAIM 20
. The method of claim 9 further comprising the steps of : receiving data associated with a resource ;
identifying an XML schema that dictates the metadata attributes that apply to said resource ;
detecting that said resource includes first data (first data) associated with metadata attributes that are expressly identified in said XML schema ;
and second data associated with metadata attributes that are not expressly identified in said XML schema ;
storing the first data in columns that correspond to said metadata attributes that are expressly identified in said XML schema ;
and storing the second data in a catch-all column .

US8190610B2
CLAIM 44
. The computer system of claim 42 , wherein at least some of the reducers include a sort , group-by-key and combine task , the d the at least one processor and memory are further operable for : generating and providing metadata (stores metadata) for at least some of the mapping , partitioning , combining , grouping and sorting .
US7047253B1
CLAIM 5
. A computer-implemented method for managing data stored in a database system , the method comprising the steps of : storing information in content structures that are separate from hierarchy structures ;
storing metadata for a plurality of resources in said hierarchy structures , wherein said metadata includes : location data , associated with a given resource of said plurality of resources , that identifies which information in said content structures represents content of said given resource ;
and hierarchy data that indicates a position , within an information hierarchy , of each of said resources ;
and wherein : a first table that has a row corresponding to each resource in the information hierarchy , wherein each row stores metadata (providing metadata) about the resource to which the row corresponds ;
and a second table that identifies parent-child relationships of the resources that belong to said information hierarchy .




US8190610B2

Filed: 2006-10-05     Issued: 2012-05-29

MapReduce for distributed database processing

(Original Assignee) Yahoo Inc     (Current Assignee) R2 Solutions LLC ; Altaba Inc

Ali Dasdan, Hung-Chih Yang, Ruey-Lung Hsiao
EP1294144A2

Filed: 2002-09-17     Issued: 2003-03-19

System and method for router data distribution

(Original Assignee) Chiaro Networks Ltd     (Current Assignee) CHIARO NETWORKS LTD. ; Chiaro Networks Ltd

Steve M. Simmons, Jim Kleiner, Qiang Li, Bing Liu, Lance Arnold Visser
US8190610B2
CLAIM 1
. A method of processing data of a data set over a distributed system , wherein the data set comprises a plurality of data groups , the method comprising : partitioning the data of each one of the data groups into a plurality of data partitions that each have a plurality of key-value pairs and providing each data partition to a selected one of a plurality of mapping functions that are each user-configurable to independently output a plurality of lists of values for each of a set of keys found in such map function' ;

s corresponding data partition to form corresponding intermediate data for that data group and identifiable to that data group , wherein the data of a first data group has a different schema than the data of a second data (said time) group and the data of the first data group is mapped differently than the data of the second data group so that different lists of values are output for the corresponding different intermediate data , wherein the different schema and corresponding different intermediate data have a key in common ;

and reducing the intermediate data for the data groups to at least one output data group , including processing the intermediate data for each data group in a manner that is defined to correspond to that data group , so as to result in a merging of the corresponding different intermediate data based on the key in common , wherein the mapping and reducing operations are performed by a distributed system .
EP1294144A2
CLAIM 7
The protocol of claim 6 wherein timer intervals of said time (second data, second data group) rs in said distributor are not integral multiples of one another , thereby avoiding cyclical patterns , such that message load balancing is improved and message storms are minimized .

US8190610B2
CLAIM 12
. The method of claim 5 , wherein : the intermediate data includes a plurality of grouped sets of key/value pairs ;

the reducing step is carried out by a plurality of reducers ;

for at least one of the reducers , the iterator corresponding to a particular data (one source) group , for that reducer , operates according to a different key of a different schema than the iterator corresponding to another particular data group , for that reducer .
EP1294144A2
CLAIM 1
In a router network , a reliable method of broadcasting dynamically changing routing tables incrementally from at least one source (particular data, particular data group) to multiple consumers in accordance with a protocol , comprising : maintaining a copy of the current contents of said routing tables on said at least one source ;
communicating said dynamic changes in said routing tables from said at least one source to a single active distributor ;
buffering said dynamic changes at said distributor ;
and broadcasting said dynamic changes in messages from said distributor to said multiple consumers , such that said broadcast dynamic changes are received and applied consistently across said multiple consumers , and such that said messages from said distributor to said multiple consumers are paced at said distributor , such that message congestion and message storms are avoided .

US8190610B2
CLAIM 17
. A computer system including a plurality of computing devices , the computer system configured to process data of a data set , wherein the data set comprises a plurality of data groups , the computer system comprises at least one processor and memory that are operable to perform the following operations : partitioning the data of each one of the data groups into a plurality of data partitions that each have a plurality of key-value pairs and providing each data partition to a selected one of a plurality of mapping functions that are each user-configurable to independently output a plurality of lists of values for each of a set of keys found in such map function' ;

s corresponding data partition to form corresponding intermediate data for that data group and identifiable to that data group , wherein the data of a first data group has a different schema than the data of a second data (said time) group and the data of the first data group is mapped differently than the data of the second data group so that different lists of values are output for the corresponding different intermediate data , wherein the different schema and corresponding different intermediate data have a key in common ;

and reduce the intermediate data for the data groups to at least one output data group , including processing the intermediate data for each data group in a manner that is defined to correspond to that data group so as to result in a merging of the corresponding different intermediate data based on the key in common .
EP1294144A2
CLAIM 7
The protocol of claim 6 wherein timer intervals of said time (second data, second data group) rs in said distributor are not integral multiples of one another , thereby avoiding cyclical patterns , such that message load balancing is improved and message storms are minimized .

US8190610B2
CLAIM 22
. The computer system of claim 21 , wherein : the intermediate data includes a plurality of grouped sets of key/value pairs ;

the at least one processor and memory are further operable to reduce the intermediate data via a plurality of reducers ;

for at least one of the reducers , the iterator corresponding to a particular data (one source) group , for that reducer , is configured to operate according to a different key of a different schema than the iterator corresponding to another particular data group , for that reducer .
EP1294144A2
CLAIM 1
In a router network , a reliable method of broadcasting dynamically changing routing tables incrementally from at least one source (particular data, particular data group) to multiple consumers in accordance with a protocol , comprising : maintaining a copy of the current contents of said routing tables on said at least one source ;
communicating said dynamic changes in said routing tables from said at least one source to a single active distributor ;
buffering said dynamic changes at said distributor ;
and broadcasting said dynamic changes in messages from said distributor to said multiple consumers , such that said broadcast dynamic changes are received and applied consistently across said multiple consumers , and such that said messages from said distributor to said multiple consumers are paced at said distributor , such that message congestion and message storms are avoided .

US8190610B2
CLAIM 33
. A map-reduce method of processing data from a plurality of groups having different schema over a computer system , the method comprising : for a first data set having a plurality of first key-value pairs , wherein such first data set belongs to a first data group and the first key-value pairs have a first schema , partitioning the first data set into a plurality of data partitions and providing each data partition to a selected one of a plurality of mapping functions that are each user-configurable to independently output a plurality of lists of values for each of a set of keys found in such map function' ;

s corresponding data partition to form a first intermediate data set having a first set of resulting key-value pairs ;

for a second data (said time) set having a plurality of second key-value pairs , wherein such second data set belongs to a second data group and the second key-value pairs have a second schema , partitioning the first data set into a plurality of data partitions and providing each data partition to a selected one of a plurality of mapping functions to form a second intermediate data set having a second set (said server) of resulting key-value pairs differing from the first set of resulting key-value pairs , wherein the first schema differs from the second schema , wherein the first and second schema and the first and second set of resulting key-value pairs have a key in common ;

and reducing the first and the second intermediate data set together so as to form an output data set , wherein the reducing is accomplished by iterating on at least one key from the first and second intermediate data set , and the output data set has a different schema than the first and second schema , wherein the mapping and reducing operations are performed by a distributed system .
EP1294144A2
CLAIM 4
The method of claim 1 wherein said distributor comprises a multi-threaded server which communicates bidirectionally with said at least one source on one thread , which communicates bidirectionally with said multiple consumers on a second thread , and which operates timers on a third thread , said server (second set) having as clients said at least one source and said multiple consumers .

EP1294144A2
CLAIM 7
The protocol of claim 6 wherein timer intervals of said time (second data, second data group) rs in said distributor are not integral multiples of one another , thereby avoiding cyclical patterns , such that message load balancing is improved and message storms are minimized .

US8190610B2
CLAIM 40
. A computer system including a plurality of computing devices , the computer system configured to process data of a data set , wherein the data set comprises a plurality of data groups , the computer system comprises at least one processor and memory that are operable to perform the following operations : for a first data set having a plurality of first key-value pairs , wherein such first data set belongs to a first data group and the first key-value pairs have a first schema , partitioning the first data set into a plurality of data partitions and providing each data partition to a selected one of a plurality of mapping functions that are each user-configurable to independently output a plurality of lists of values for each of a set of keys found in such map function' ;

s corresponding data partition to form a first intermediate data set having a first set of resulting key-value pairs ;

for a second data (said time) set having a plurality of second key-value pairs , wherein such second data set belongs to a second data group and the second key-value pairs have a second schema , partitioning the first data set into a plurality of data partitions and providing each data partition to a selected one of a plurality of mapping functions to form a second intermediate data set having a second set (said server) of resulting key-value pairs differing from the first set of resulting key-value pairs , wherein the first schema differs from the second schema ;

and reducing the first and the second intermediate data set together so as to form an output data set , wherein the reducing is accomplished by iterating on at least one key from the first and second intermediate data set , and the output data set has a different schema than the first and second schema , wherein the first and second schema and the first and second set of resulting key-value pairs have a key in common .
EP1294144A2
CLAIM 4
The method of claim 1 wherein said distributor comprises a multi-threaded server which communicates bidirectionally with said at least one source on one thread , which communicates bidirectionally with said multiple consumers on a second thread , and which operates timers on a third thread , said server (second set) having as clients said at least one source and said multiple consumers .

EP1294144A2
CLAIM 7
The protocol of claim 6 wherein timer intervals of said time (second data, second data group) rs in said distributor are not integral multiples of one another , thereby avoiding cyclical patterns , such that message load balancing is improved and message storms are minimized .




US8190610B2

Filed: 2006-10-05     Issued: 2012-05-29

MapReduce for distributed database processing

(Original Assignee) Yahoo Inc     (Current Assignee) R2 Solutions LLC ; Altaba Inc

Ali Dasdan, Hung-Chih Yang, Ruey-Lung Hsiao
US20040030703A1

Filed: 2002-08-12     Issued: 2004-02-12

Method, system, and program for merging log entries from multiple recovery log files

(Original Assignee) International Business Machines Corp     (Current Assignee) International Business Machines Corp

Serge Bourbonnais, Elizabeth Hamel, Bruce Lindsay, Chengfei Liu, Jens Stankiewitz, Tuong Truong
US8190610B2
CLAIM 1
. A method of processing data (several data) of a data set over a distributed system , wherein the data set comprises a plurality of data groups (several data) , the method comprising : partitioning the data of each one of the data groups into a plurality of data partitions that each have a plurality of key-value pairs and providing each data partition to a selected one of a plurality of mapping functions that are each user-configurable to independently output a plurality of lists of values for each of a set of keys found in such map function' ;

s corresponding data partition to form corresponding intermediate data for that data group and identifiable to that data group , wherein the data of a first data group has a different schema than the data of a second data group and the data of the first data group is mapped differently than the data of the second data group so that different lists of values are output for the corresponding different intermediate data , wherein the different schema and corresponding different intermediate data have a key in common ;

and reducing the intermediate data for the data groups to at least one output data group , including processing the intermediate data for each data group in a manner that is defined to correspond to that data group , so as to result in a merging of the corresponding different intermediate data based on the key in common , wherein the mapping and reducing operations (comprises instructions) are performed by a distributed system .
US20040030703A1
CLAIM 21
. A method to coordinate processing of complete transactions with specific log entries of each of several data (data groups, output data groups, processing data, output data set) base recovery logs , comprising : processing complete transactions ;
recording an address in each log entry of an earliest reported log entry for a transaction which is not complete and not yet processed along with the causally ordered , ascending timestamp of the log entry of the commit kind for the most recently completed and processed transaction ;
and atomically committing the changes pursuant to the processing of completed transactions , the earliest reported entries for incomplete transactions , and the causually ordered , ascending timestamp .

US20040030703A1
CLAIM 24
. A system for merging log entries from multiple recovery logs , comprising : a set of nodes , each node having a recovery log ;
and a computer program executable by a computer , wherein the computer program comprises instructions (reducing operations) for : recording local transactions within each recovery log using a local transaction identifier ;
and merging local transactions to form global transactions across the multiple recovery logs using global transaction identifiers .

US8190610B2
CLAIM 2
. The method of claim 1 , wherein : the at least one output data group is a plurality of output data groups (several data) .
US20040030703A1
CLAIM 21
. A method to coordinate processing of complete transactions with specific log entries of each of several data (data groups, output data groups, processing data, output data set) base recovery logs , comprising : processing complete transactions ;
recording an address in each log entry of an earliest reported log entry for a transaction which is not complete and not yet processed along with the causally ordered , ascending timestamp of the log entry of the commit kind for the most recently completed and processed transaction ;
and atomically committing the changes pursuant to the processing of completed transactions , the earliest reported entries for incomplete transactions , and the causually ordered , ascending timestamp .

US8190610B2
CLAIM 13
. The method of claim 1 , wherein : the intermediate data processing step of the reducing step further comprises processing data (several data) that is not intermediate data .
US20040030703A1
CLAIM 21
. A method to coordinate processing of complete transactions with specific log entries of each of several data (data groups, output data groups, processing data, output data set) base recovery logs , comprising : processing complete transactions ;
recording an address in each log entry of an earliest reported log entry for a transaction which is not complete and not yet processed along with the causally ordered , ascending timestamp of the log entry of the commit kind for the most recently completed and processed transaction ;
and atomically committing the changes pursuant to the processing of completed transactions , the earliest reported entries for incomplete transactions , and the causually ordered , ascending timestamp .

US8190610B2
CLAIM 16
. The method of claim 1 wherein the reducing step includes relating the data among the plurality of data groups (several data) .
US20040030703A1
CLAIM 21
. A method to coordinate processing of complete transactions with specific log entries of each of several data (data groups, output data groups, processing data, output data set) base recovery logs , comprising : processing complete transactions ;
recording an address in each log entry of an earliest reported log entry for a transaction which is not complete and not yet processed along with the causally ordered , ascending timestamp of the log entry of the commit kind for the most recently completed and processed transaction ;
and atomically committing the changes pursuant to the processing of completed transactions , the earliest reported entries for incomplete transactions , and the causually ordered , ascending timestamp .

US8190610B2
CLAIM 17
. A computer system including a plurality of computing devices , the computer system configured to process data of a data set , wherein the data set comprises a plurality of data groups (several data) , the computer system comprises at least one processor and memory that are operable to perform the following operations : partitioning the data of each one of the data groups into a plurality of data partitions that each have a plurality of key-value pairs and providing each data partition to a selected one of a plurality of mapping functions that are each user-configurable to independently output a plurality of lists of values for each of a set of keys found in such map function' ;

s corresponding data partition to form corresponding intermediate data for that data group and identifiable to that data group , wherein the data of a first data group has a different schema than the data of a second data group and the data of the first data group is mapped differently than the data of the second data group so that different lists of values are output for the corresponding different intermediate data , wherein the different schema and corresponding different intermediate data have a key in common ;

and reduce the intermediate data for the data groups to at least one output data group , including processing the intermediate data for each data group in a manner that is defined to correspond to that data group so as to result in a merging of the corresponding different intermediate data based on the key in common .
US20040030703A1
CLAIM 21
. A method to coordinate processing of complete transactions with specific log entries of each of several data (data groups, output data groups, processing data, output data set) base recovery logs , comprising : processing complete transactions ;
recording an address in each log entry of an earliest reported log entry for a transaction which is not complete and not yet processed along with the causally ordered , ascending timestamp of the log entry of the commit kind for the most recently completed and processed transaction ;
and atomically committing the changes pursuant to the processing of completed transactions , the earliest reported entries for incomplete transactions , and the causually ordered , ascending timestamp .

US8190610B2
CLAIM 18
. The computer system of claim 17 , wherein : the at least one output data group is a plurality of output data groups (several data) .
US20040030703A1
CLAIM 21
. A method to coordinate processing of complete transactions with specific log entries of each of several data (data groups, output data groups, processing data, output data set) base recovery logs , comprising : processing complete transactions ;
recording an address in each log entry of an earliest reported log entry for a transaction which is not complete and not yet processed along with the causally ordered , ascending timestamp of the log entry of the commit kind for the most recently completed and processed transaction ;
and atomically committing the changes pursuant to the processing of completed transactions , the earliest reported entries for incomplete transactions , and the causually ordered , ascending timestamp .

US8190610B2
CLAIM 29
. The computer system of claim 17 , wherein : the intermediate data processing of the reducing further comprises processing data (several data) that is not intermediate data .
US20040030703A1
CLAIM 21
. A method to coordinate processing of complete transactions with specific log entries of each of several data (data groups, output data groups, processing data, output data set) base recovery logs , comprising : processing complete transactions ;
recording an address in each log entry of an earliest reported log entry for a transaction which is not complete and not yet processed along with the causally ordered , ascending timestamp of the log entry of the commit kind for the most recently completed and processed transaction ;
and atomically committing the changes pursuant to the processing of completed transactions , the earliest reported entries for incomplete transactions , and the causually ordered , ascending timestamp .

US8190610B2
CLAIM 32
. The computer system of claim 17 , wherein the at least one processor and memory are further operable to reduce by relating the data among the plurality of data groups (several data) .
US20040030703A1
CLAIM 21
. A method to coordinate processing of complete transactions with specific log entries of each of several data (data groups, output data groups, processing data, output data set) base recovery logs , comprising : processing complete transactions ;
recording an address in each log entry of an earliest reported log entry for a transaction which is not complete and not yet processed along with the causally ordered , ascending timestamp of the log entry of the commit kind for the most recently completed and processed transaction ;
and atomically committing the changes pursuant to the processing of completed transactions , the earliest reported entries for incomplete transactions , and the causually ordered , ascending timestamp .

US8190610B2
CLAIM 33
. A map-reduce method of processing data (several data) from a plurality of groups having different schema over a computer system , the method comprising : for a first data set having a plurality of first key-value pairs , wherein such first data set belongs to a first data group and the first key-value pairs have a first schema , partitioning the first data set into a plurality of data partitions and providing each data partition to a selected one of a plurality of mapping functions that are each user-configurable to independently output a plurality of lists of values for each of a set of keys found in such map function' ;

s corresponding data partition to form a first intermediate data set having a first set of resulting key-value pairs ;

for a second data set having a plurality of second key-value pairs , wherein such second data set belongs to a second data group and the second key-value pairs have a second schema , partitioning the first data set into a plurality of data partitions and providing each data partition to a selected one of a plurality of mapping functions to form a second intermediate data set having a second set of resulting key-value pairs differing from the first set of resulting key-value pairs , wherein the first schema differs from the second schema , wherein the first and second schema and the first and second set of resulting key-value pairs have a key in common ;

and reducing the first and the second intermediate data set together so as to form an output data set (several data) , wherein the reducing is accomplished by iterating on at least one key from the first and second intermediate data set , and the output data set has a different schema than the first and second schema , wherein the mapping and reducing operations (comprises instructions) are performed by a distributed system .
US20040030703A1
CLAIM 21
. A method to coordinate processing of complete transactions with specific log entries of each of several data (data groups, output data groups, processing data, output data set) base recovery logs , comprising : processing complete transactions ;
recording an address in each log entry of an earliest reported log entry for a transaction which is not complete and not yet processed along with the causally ordered , ascending timestamp of the log entry of the commit kind for the most recently completed and processed transaction ;
and atomically committing the changes pursuant to the processing of completed transactions , the earliest reported entries for incomplete transactions , and the causually ordered , ascending timestamp .

US20040030703A1
CLAIM 24
. A system for merging log entries from multiple recovery logs , comprising : a set of nodes , each node having a recovery log ;
and a computer program executable by a computer , wherein the computer program comprises instructions (reducing operations) for : recording local transactions within each recovery log using a local transaction identifier ;
and merging local transactions to form global transactions across the multiple recovery logs using global transaction identifiers .

US8190610B2
CLAIM 34
. The map-reduce method of claim 33 , wherein the reducing is accomplished by iterating on at least one key from each of the first and second intermediate data set so that the output data set (several data) is a merging of a portion of the first and second intermediate data set .
US20040030703A1
CLAIM 21
. A method to coordinate processing of complete transactions with specific log entries of each of several data (data groups, output data groups, processing data, output data set) base recovery logs , comprising : processing complete transactions ;
recording an address in each log entry of an earliest reported log entry for a transaction which is not complete and not yet processed along with the causally ordered , ascending timestamp of the log entry of the commit kind for the most recently completed and processed transaction ;
and atomically committing the changes pursuant to the processing of completed transactions , the earliest reported entries for incomplete transactions , and the causually ordered , ascending timestamp .

US8190610B2
CLAIM 40
. A computer system including a plurality of computing devices , the computer system configured to process data of a data set , wherein the data set comprises a plurality of data groups (several data) , the computer system comprises at least one processor and memory that are operable to perform the following operations : for a first data set having a plurality of first key-value pairs , wherein such first data set belongs to a first data group and the first key-value pairs have a first schema , partitioning the first data set into a plurality of data partitions and providing each data partition to a selected one of a plurality of mapping functions that are each user-configurable to independently output a plurality of lists of values for each of a set of keys found in such map function' ;

s corresponding data partition to form a first intermediate data set having a first set of resulting key-value pairs ;

for a second data set having a plurality of second key-value pairs , wherein such second data set belongs to a second data group and the second key-value pairs have a second schema , partitioning the first data set into a plurality of data partitions and providing each data partition to a selected one of a plurality of mapping functions to form a second intermediate data set having a second set of resulting key-value pairs differing from the first set of resulting key-value pairs , wherein the first schema differs from the second schema ;

and reducing the first and the second intermediate data set together so as to form an output data set (several data) , wherein the reducing is accomplished by iterating on at least one key from the first and second intermediate data set , and the output data set has a different schema than the first and second schema , wherein the first and second schema and the first and second set of resulting key-value pairs have a key in common .
US20040030703A1
CLAIM 21
. A method to coordinate processing of complete transactions with specific log entries of each of several data (data groups, output data groups, processing data, output data set) base recovery logs , comprising : processing complete transactions ;
recording an address in each log entry of an earliest reported log entry for a transaction which is not complete and not yet processed along with the causally ordered , ascending timestamp of the log entry of the commit kind for the most recently completed and processed transaction ;
and atomically committing the changes pursuant to the processing of completed transactions , the earliest reported entries for incomplete transactions , and the causually ordered , ascending timestamp .

US8190610B2
CLAIM 41
. The computer system of claim 40 , wherein the reducing is accomplished by iterating on at least one key from each of the first and second intermediate data set so that the output data set (several data) is a merging of a portion of the first and second intermediate data set .
US20040030703A1
CLAIM 21
. A method to coordinate processing of complete transactions with specific log entries of each of several data (data groups, output data groups, processing data, output data set) base recovery logs , comprising : processing complete transactions ;
recording an address in each log entry of an earliest reported log entry for a transaction which is not complete and not yet processed along with the causally ordered , ascending timestamp of the log entry of the commit kind for the most recently completed and processed transaction ;
and atomically committing the changes pursuant to the processing of completed transactions , the earliest reported entries for incomplete transactions , and the causually ordered , ascending timestamp .




US8190610B2

Filed: 2006-10-05     Issued: 2012-05-29

MapReduce for distributed database processing

(Original Assignee) Yahoo Inc     (Current Assignee) R2 Solutions LLC ; Altaba Inc

Ali Dasdan, Hung-Chih Yang, Ruey-Lung Hsiao
JP2004038313A

Filed: 2002-06-28     Issued: 2004-02-05

ログ取得方法およびプログラム、記憶媒体

(Original Assignee) Canon Inc; キヤノン株式会社     

Makoto Mihara, 三原 誠
US8190610B2
CLAIM 1
. A method of processing data (コンピュ) of a data set over a distributed system , wherein the data set comprises a plurality of data groups , the method comprising : partitioning the data of each one of the data groups into a plurality of data partitions that each have a plurality of key-value pairs and providing each data partition to a selected one of a plurality of mapping functions that are each user-configurable to independently output a plurality of lists of values for each of a set of keys found in such map function' ;

s corresponding data partition to form corresponding intermediate data for that data group (えること) and identifiable to that data group , wherein the data of a first data group has a different schema than the data of a second data group and the data of the first data group is mapped differently than the data of the second data group so that different lists of values are output for the corresponding different intermediate data , wherein the different schema and corresponding different intermediate data have a key in common ;

and reducing the intermediate data for the data groups to at least one output data group , including processing the intermediate data for each data group in a manner that is defined to correspond to that data group , so as to result in a merging of the corresponding different intermediate data based on the key in common , wherein the mapping and reducing operations are performed by a distributed system .
JP2004038313A
CLAIM 1
所定の処理を行う関数を備えるプログラムの実行中のログを取得するログ取得方法であって、 ロードされた前記所定の処理を行う関数のアドレスを、ログ取得のための関数のアドレスに書き換える工程と、 前記所定の処理を行う関数を選択する工程と、を備え、 前記ログ取得のための関数は、 前記所定の処理を行う関数を呼び出し、該所定の処理を実行させ、受け取った実行結果を前記プログラムに渡す工程と、 前記選択する工程において選択された関数を呼び出す際の所定の情報をログとして記録する工程と、 前記選択する工程において選択された関数の実行結果を受け取った際の所定の情報をログとして記録する工程と を備えること (data group, first data group) を特徴とするログ取得方法。

JP2004038313A
CLAIM 20
請求項1乃至19のいずれか1つに記載のログ取得方法をコンピュ (processing data) ータによって実現させるための制御プログラムを格納した記憶媒体。

US8190610B2
CLAIM 2
. The method of claim 1 , wherein : the at least one output data group (えること) is a plurality of output data groups .
JP2004038313A
CLAIM 1
所定の処理を行う関数を備えるプログラムの実行中のログを取得するログ取得方法であって、 ロードされた前記所定の処理を行う関数のアドレスを、ログ取得のための関数のアドレスに書き換える工程と、 前記所定の処理を行う関数を選択する工程と、を備え、 前記ログ取得のための関数は、 前記所定の処理を行う関数を呼び出し、該所定の処理を実行させ、受け取った実行結果を前記プログラムに渡す工程と、 前記選択する工程において選択された関数を呼び出す際の所定の情報をログとして記録する工程と、 前記選択する工程において選択された関数の実行結果を受け取った際の所定の情報をログとして記録する工程と を備えること (data group, first data group) を特徴とするログ取得方法。

US8190610B2
CLAIM 3
. The method of claim 1 , wherein : corresponding intermediate data for a data group (えること) being identifiable to the data group includes the corresponding intermediate data being stored such that metadata associated with a manner of storage is identifiable to the data group .
JP2004038313A
CLAIM 1
所定の処理を行う関数を備えるプログラムの実行中のログを取得するログ取得方法であって、 ロードされた前記所定の処理を行う関数のアドレスを、ログ取得のための関数のアドレスに書き換える工程と、 前記所定の処理を行う関数を選択する工程と、を備え、 前記ログ取得のための関数は、 前記所定の処理を行う関数を呼び出し、該所定の処理を実行させ、受け取った実行結果を前記プログラムに渡す工程と、 前記選択する工程において選択された関数を呼び出す際の所定の情報をログとして記録する工程と、 前記選択する工程において選択された関数の実行結果を受け取った際の所定の情報をログとして記録する工程と を備えること (data group, first data group) を特徴とするログ取得方法。

US8190610B2
CLAIM 4
. The method of claim 1 , wherein : corresponding intermediate data for a data group (えること) being identifiable to the data group includes the corresponding intermediate data being stored in association with data that is identifiable to the data group .
JP2004038313A
CLAIM 1
所定の処理を行う関数を備えるプログラムの実行中のログを取得するログ取得方法であって、 ロードされた前記所定の処理を行う関数のアドレスを、ログ取得のための関数のアドレスに書き換える工程と、 前記所定の処理を行う関数を選択する工程と、を備え、 前記ログ取得のための関数は、 前記所定の処理を行う関数を呼び出し、該所定の処理を実行させ、受け取った実行結果を前記プログラムに渡す工程と、 前記選択する工程において選択された関数を呼び出す際の所定の情報をログとして記録する工程と、 前記選択する工程において選択された関数の実行結果を受け取った際の所定の情報をログとして記録する工程と を備えること (data group, first data group) を特徴とするログ取得方法。

US8190610B2
CLAIM 5
. The method of claim 1 , wherein : processing the intermediate data for each data group (えること) in a manner that is defined to correspond to that data group includes , for each data group , employing an iterator that corresponds to that data group .
JP2004038313A
CLAIM 1
所定の処理を行う関数を備えるプログラムの実行中のログを取得するログ取得方法であって、 ロードされた前記所定の処理を行う関数のアドレスを、ログ取得のための関数のアドレスに書き換える工程と、 前記所定の処理を行う関数を選択する工程と、を備え、 前記ログ取得のための関数は、 前記所定の処理を行う関数を呼び出し、該所定の処理を実行させ、受け取った実行結果を前記プログラムに渡す工程と、 前記選択する工程において選択された関数を呼び出す際の所定の情報をログとして記録する工程と、 前記選択する工程において選択された関数の実行結果を受け取った際の所定の情報をログとして記録する工程と を備えること (data group, first data group) を特徴とするログ取得方法。

US8190610B2
CLAIM 11
. The method of claim 10 , wherein : processing the intermediate data for each data group (えること) in a manner that is defined to correspond to that data group includes , for each data group , employing an iterator that corresponds to that data group , wherein the iterator includes providing the associated metadata to the processing of the reducing step .
JP2004038313A
CLAIM 1
所定の処理を行う関数を備えるプログラムの実行中のログを取得するログ取得方法であって、 ロードされた前記所定の処理を行う関数のアドレスを、ログ取得のための関数のアドレスに書き換える工程と、 前記所定の処理を行う関数を選択する工程と、を備え、 前記ログ取得のための関数は、 前記所定の処理を行う関数を呼び出し、該所定の処理を実行させ、受け取った実行結果を前記プログラムに渡す工程と、 前記選択する工程において選択された関数を呼び出す際の所定の情報をログとして記録する工程と、 前記選択する工程において選択された関数の実行結果を受け取った際の所定の情報をログとして記録する工程と を備えること (data group, first data group) を特徴とするログ取得方法。

US8190610B2
CLAIM 12
. The method of claim 5 , wherein : the intermediate data includes a plurality of grouped sets of key/value pairs ;

the reducing step is carried out by a plurality of reducers ;

for at least one of the reducers , the iterator corresponding to a particular data group (えること) , for that reducer , operates according to a different key of a different schema than the iterator corresponding to another particular data group , for that reducer .
JP2004038313A
CLAIM 1
所定の処理を行う関数を備えるプログラムの実行中のログを取得するログ取得方法であって、 ロードされた前記所定の処理を行う関数のアドレスを、ログ取得のための関数のアドレスに書き換える工程と、 前記所定の処理を行う関数を選択する工程と、を備え、 前記ログ取得のための関数は、 前記所定の処理を行う関数を呼び出し、該所定の処理を実行させ、受け取った実行結果を前記プログラムに渡す工程と、 前記選択する工程において選択された関数を呼び出す際の所定の情報をログとして記録する工程と、 前記選択する工程において選択された関数の実行結果を受け取った際の所定の情報をログとして記録する工程と を備えること (data group, first data group) を特徴とするログ取得方法。

US8190610B2
CLAIM 13
. The method of claim 1 , wherein : the intermediate data processing step of the reducing step further comprises processing data (コンピュ) that is not intermediate data .
JP2004038313A
CLAIM 20
請求項1乃至19のいずれか1つに記載のログ取得方法をコンピュ (processing data) ータによって実現させるための制御プログラムを格納した記憶媒体。

US8190610B2
CLAIM 17
. A computer system including a plurality of computing devices , the computer system configured to process data of a data set , wherein the data set comprises a plurality of data groups , the computer system comprises at least one processor and memory that are operable to perform the following operations : partitioning the data of each one of the data groups into a plurality of data partitions that each have a plurality of key-value pairs and providing each data partition to a selected one of a plurality of mapping functions that are each user-configurable to independently output a plurality of lists of values for each of a set of keys found in such map function' ;

s corresponding data partition to form corresponding intermediate data for that data group (えること) and identifiable to that data group , wherein the data of a first data group has a different schema than the data of a second data group and the data of the first data group is mapped differently than the data of the second data group so that different lists of values are output for the corresponding different intermediate data , wherein the different schema and corresponding different intermediate data have a key in common ;

and reduce the intermediate data for the data groups to at least one output data group , including processing the intermediate data for each data group in a manner that is defined to correspond to that data group so as to result in a merging of the corresponding different intermediate data based on the key in common .
JP2004038313A
CLAIM 1
所定の処理を行う関数を備えるプログラムの実行中のログを取得するログ取得方法であって、 ロードされた前記所定の処理を行う関数のアドレスを、ログ取得のための関数のアドレスに書き換える工程と、 前記所定の処理を行う関数を選択する工程と、を備え、 前記ログ取得のための関数は、 前記所定の処理を行う関数を呼び出し、該所定の処理を実行させ、受け取った実行結果を前記プログラムに渡す工程と、 前記選択する工程において選択された関数を呼び出す際の所定の情報をログとして記録する工程と、 前記選択する工程において選択された関数の実行結果を受け取った際の所定の情報をログとして記録する工程と を備えること (data group, first data group) を特徴とするログ取得方法。

US8190610B2
CLAIM 18
. The computer system of claim 17 , wherein : the at least one output data group (えること) is a plurality of output data groups .
JP2004038313A
CLAIM 1
所定の処理を行う関数を備えるプログラムの実行中のログを取得するログ取得方法であって、 ロードされた前記所定の処理を行う関数のアドレスを、ログ取得のための関数のアドレスに書き換える工程と、 前記所定の処理を行う関数を選択する工程と、を備え、 前記ログ取得のための関数は、 前記所定の処理を行う関数を呼び出し、該所定の処理を実行させ、受け取った実行結果を前記プログラムに渡す工程と、 前記選択する工程において選択された関数を呼び出す際の所定の情報をログとして記録する工程と、 前記選択する工程において選択された関数の実行結果を受け取った際の所定の情報をログとして記録する工程と を備えること (data group, first data group) を特徴とするログ取得方法。

US8190610B2
CLAIM 19
. The computer system of claim 17 , wherein : corresponding intermediate data for a data group (えること) being identifiable to the data group includes the corresponding intermediate data being stored such that metadata associated with the manner of storage is identifiable to the data group .
JP2004038313A
CLAIM 1
所定の処理を行う関数を備えるプログラムの実行中のログを取得するログ取得方法であって、 ロードされた前記所定の処理を行う関数のアドレスを、ログ取得のための関数のアドレスに書き換える工程と、 前記所定の処理を行う関数を選択する工程と、を備え、 前記ログ取得のための関数は、 前記所定の処理を行う関数を呼び出し、該所定の処理を実行させ、受け取った実行結果を前記プログラムに渡す工程と、 前記選択する工程において選択された関数を呼び出す際の所定の情報をログとして記録する工程と、 前記選択する工程において選択された関数の実行結果を受け取った際の所定の情報をログとして記録する工程と を備えること (data group, first data group) を特徴とするログ取得方法。

US8190610B2
CLAIM 20
. The computer system of claim 19 , wherein : corresponding intermediate data for a data group (えること) being identifiable to the data group includes the computer system being configured to store the corresponding intermediate data in association with data that is identifiable to the data group .
JP2004038313A
CLAIM 1
所定の処理を行う関数を備えるプログラムの実行中のログを取得するログ取得方法であって、 ロードされた前記所定の処理を行う関数のアドレスを、ログ取得のための関数のアドレスに書き換える工程と、 前記所定の処理を行う関数を選択する工程と、を備え、 前記ログ取得のための関数は、 前記所定の処理を行う関数を呼び出し、該所定の処理を実行させ、受け取った実行結果を前記プログラムに渡す工程と、 前記選択する工程において選択された関数を呼び出す際の所定の情報をログとして記録する工程と、 前記選択する工程において選択された関数の実行結果を受け取った際の所定の情報をログとして記録する工程と を備えること (data group, first data group) を特徴とするログ取得方法。

US8190610B2
CLAIM 21
. The computer system of claim 17 , wherein : the at least one processor and memory being configured to process the intermediate data for each data group (えること) in a manner that is defined to correspond to that data group includes , for each data group , the at least one processor and memory being configured to employ an iterator that corresponds to that data group .
JP2004038313A
CLAIM 1
所定の処理を行う関数を備えるプログラムの実行中のログを取得するログ取得方法であって、 ロードされた前記所定の処理を行う関数のアドレスを、ログ取得のための関数のアドレスに書き換える工程と、 前記所定の処理を行う関数を選択する工程と、を備え、 前記ログ取得のための関数は、 前記所定の処理を行う関数を呼び出し、該所定の処理を実行させ、受け取った実行結果を前記プログラムに渡す工程と、 前記選択する工程において選択された関数を呼び出す際の所定の情報をログとして記録する工程と、 前記選択する工程において選択された関数の実行結果を受け取った際の所定の情報をログとして記録する工程と を備えること (data group, first data group) を特徴とするログ取得方法。

US8190610B2
CLAIM 22
. The computer system of claim 21 , wherein : the intermediate data includes a plurality of grouped sets of key/value pairs ;

the at least one processor and memory are further operable to reduce the intermediate data via a plurality of reducers ;

for at least one of the reducers , the iterator corresponding to a particular data group (えること) , for that reducer , is configured to operate according to a different key of a different schema than the iterator corresponding to another particular data group , for that reducer .
JP2004038313A
CLAIM 1
所定の処理を行う関数を備えるプログラムの実行中のログを取得するログ取得方法であって、 ロードされた前記所定の処理を行う関数のアドレスを、ログ取得のための関数のアドレスに書き換える工程と、 前記所定の処理を行う関数を選択する工程と、を備え、 前記ログ取得のための関数は、 前記所定の処理を行う関数を呼び出し、該所定の処理を実行させ、受け取った実行結果を前記プログラムに渡す工程と、 前記選択する工程において選択された関数を呼び出す際の所定の情報をログとして記録する工程と、 前記選択する工程において選択された関数の実行結果を受け取った際の所定の情報をログとして記録する工程と を備えること (data group, first data group) を特徴とするログ取得方法。

US8190610B2
CLAIM 28
. The computer system of claim 27 , wherein : the at least one processor and memory being further operable to process the intermediate data for each data group (えること) in a manner that is defined to correspond to that data group includes , for each data group , the at least one processor and memory being further operable to employ an iterator that corresponds to that data group , wherein the iterator is configured to provide the associated metadata to the reducing .
JP2004038313A
CLAIM 1
所定の処理を行う関数を備えるプログラムの実行中のログを取得するログ取得方法であって、 ロードされた前記所定の処理を行う関数のアドレスを、ログ取得のための関数のアドレスに書き換える工程と、 前記所定の処理を行う関数を選択する工程と、を備え、 前記ログ取得のための関数は、 前記所定の処理を行う関数を呼び出し、該所定の処理を実行させ、受け取った実行結果を前記プログラムに渡す工程と、 前記選択する工程において選択された関数を呼び出す際の所定の情報をログとして記録する工程と、 前記選択する工程において選択された関数の実行結果を受け取った際の所定の情報をログとして記録する工程と を備えること (data group, first data group) を特徴とするログ取得方法。

US8190610B2
CLAIM 29
. The computer system of claim 17 , wherein : the intermediate data processing of the reducing further comprises processing data (コンピュ) that is not intermediate data .
JP2004038313A
CLAIM 20
請求項1乃至19のいずれか1つに記載のログ取得方法をコンピュ (processing data) ータによって実現させるための制御プログラムを格納した記憶媒体。

US8190610B2
CLAIM 33
. A map-reduce method of processing data (コンピュ) from a plurality of groups having different schema over a computer system , the method comprising : for a first data set having a plurality of first key-value pairs , wherein such first data set belongs to a first data group (えること) and the first key-value pairs have a first schema , partitioning the first data set into a plurality of data partitions and providing each data partition to a selected one of a plurality of mapping functions that are each user-configurable to independently output a plurality of lists of values for each of a set of keys found in such map function' ;

s corresponding data partition to form a first intermediate data set having a first set of resulting key-value pairs ;

for a second data set having a plurality of second key-value pairs , wherein such second data set belongs to a second data group and the second key-value pairs have a second schema , partitioning the first data set into a plurality of data partitions and providing each data partition to a selected one of a plurality of mapping functions to form a second intermediate data (ファイル) set having a second set of resulting key-value pairs differing from the first set of resulting key-value pairs , wherein the first schema differs from the second schema , wherein the first and second schema and the first and second set of resulting key-value pairs have a key in common ;

and reducing the first and the second intermediate data set together so as to form an output data set , wherein the reducing is accomplished by iterating on at least one key from the first and second intermediate data set , and the output data set has a different schema than the first and second schema , wherein the mapping and reducing operations are performed by a distributed system .
JP2004038313A
CLAIM 1
所定の処理を行う関数を備えるプログラムの実行中のログを取得するログ取得方法であって、 ロードされた前記所定の処理を行う関数のアドレスを、ログ取得のための関数のアドレスに書き換える工程と、 前記所定の処理を行う関数を選択する工程と、を備え、 前記ログ取得のための関数は、 前記所定の処理を行う関数を呼び出し、該所定の処理を実行させ、受け取った実行結果を前記プログラムに渡す工程と、 前記選択する工程において選択された関数を呼び出す際の所定の情報をログとして記録する工程と、 前記選択する工程において選択された関数の実行結果を受け取った際の所定の情報をログとして記録する工程と を備えること (data group, first data group) を特徴とするログ取得方法。

JP2004038313A
CLAIM 13
所定の処理を行う関数を備えるプログラムの実行中のログを取得するログ取得方法であって、 ロードされた前記所定の処理を行う関数のアドレスを、ログ取得のための関数のアドレスに書き換える工程と、 前記所定の処理を行う関数を選択する工程と、を備え、 前記ログ取得のための関数は、 前記所定の処理を行う関数を呼び出し、該所定の処理を実行させ、受け取った実行結果を前記プログラムに渡す工程と、 前記所定の処理を行う関数を呼び出す際の所定の情報と、前記実行結果を受け取った際の所定の情報とをログとして記録する工程と、を備え、 前記ログとして記録する工程は、ログのサイズが一定サイズ以上になった場合に、新たに別ファイル (second intermediate data) を作成するとともに、作成日時の古いログを削除することを特徴とするログ取得方法。

JP2004038313A
CLAIM 20
請求項1乃至19のいずれか1つに記載のログ取得方法をコンピュ (processing data) ータによって実現させるための制御プログラムを格納した記憶媒体。

US8190610B2
CLAIM 34
. The map-reduce method of claim 33 , wherein the reducing is accomplished by iterating on at least one key from each of the first and second intermediate data (ファイル) set so that the output data set is a merging of a portion of the first and second intermediate data set .
JP2004038313A
CLAIM 13
所定の処理を行う関数を備えるプログラムの実行中のログを取得するログ取得方法であって、 ロードされた前記所定の処理を行う関数のアドレスを、ログ取得のための関数のアドレスに書き換える工程と、 前記所定の処理を行う関数を選択する工程と、を備え、 前記ログ取得のための関数は、 前記所定の処理を行う関数を呼び出し、該所定の処理を実行させ、受け取った実行結果を前記プログラムに渡す工程と、 前記所定の処理を行う関数を呼び出す際の所定の情報と、前記実行結果を受け取った際の所定の情報とをログとして記録する工程と、を備え、 前記ログとして記録する工程は、ログのサイズが一定サイズ以上になった場合に、新たに別ファイル (second intermediate data) を作成するとともに、作成日時の古いログを削除することを特徴とするログ取得方法。

US8190610B2
CLAIM 35
. The map-reduce method of claim 33 , wherein the reducing is accomplished by a plurality of reducers , the method further comprises : partitioning the first or second intermediate data (ファイル) set into a plurality of partitions , at least some of the key-value pairs of the first or second intermediate data being provided to more than one of the partitions ;

and providing the partitioned intermediate data set of each partition to a separate one of the reducers .
JP2004038313A
CLAIM 13
所定の処理を行う関数を備えるプログラムの実行中のログを取得するログ取得方法であって、 ロードされた前記所定の処理を行う関数のアドレスを、ログ取得のための関数のアドレスに書き換える工程と、 前記所定の処理を行う関数を選択する工程と、を備え、 前記ログ取得のための関数は、 前記所定の処理を行う関数を呼び出し、該所定の処理を実行させ、受け取った実行結果を前記プログラムに渡す工程と、 前記所定の処理を行う関数を呼び出す際の所定の情報と、前記実行結果を受け取った際の所定の情報とをログとして記録する工程と、を備え、 前記ログとして記録する工程は、ログのサイズが一定サイズ以上になった場合に、新たに別ファイル (second intermediate data) を作成するとともに、作成日時の古いログを削除することを特徴とするログ取得方法。

US8190610B2
CLAIM 36
. The map-reduce method of claim 35 , wherein all of the key-value pairs of the first and second intermediate data (ファイル) set are provided to all of the reducers .
JP2004038313A
CLAIM 13
所定の処理を行う関数を備えるプログラムの実行中のログを取得するログ取得方法であって、 ロードされた前記所定の処理を行う関数のアドレスを、ログ取得のための関数のアドレスに書き換える工程と、 前記所定の処理を行う関数を選択する工程と、を備え、 前記ログ取得のための関数は、 前記所定の処理を行う関数を呼び出し、該所定の処理を実行させ、受け取った実行結果を前記プログラムに渡す工程と、 前記所定の処理を行う関数を呼び出す際の所定の情報と、前記実行結果を受け取った際の所定の情報とをログとして記録する工程と、を備え、 前記ログとして記録する工程は、ログのサイズが一定サイズ以上になった場合に、新たに別ファイル (second intermediate data) を作成するとともに、作成日時の古いログを削除することを特徴とするログ取得方法。

US8190610B2
CLAIM 40
. A computer system including a plurality of computing devices , the computer system configured to process data of a data set , wherein the data set comprises a plurality of data groups , the computer system comprises at least one processor and memory that are operable to perform the following operations : for a first data set having a plurality of first key-value pairs , wherein such first data set belongs to a first data group (えること) and the first key-value pairs have a first schema , partitioning the first data set into a plurality of data partitions and providing each data partition to a selected one of a plurality of mapping functions that are each user-configurable to independently output a plurality of lists of values for each of a set of keys found in such map function' ;

s corresponding data partition to form a first intermediate data set having a first set of resulting key-value pairs ;

for a second data set having a plurality of second key-value pairs , wherein such second data set belongs to a second data group and the second key-value pairs have a second schema , partitioning the first data set into a plurality of data partitions and providing each data partition to a selected one of a plurality of mapping functions to form a second intermediate data (ファイル) set having a second set of resulting key-value pairs differing from the first set of resulting key-value pairs , wherein the first schema differs from the second schema ;

and reducing the first and the second intermediate data set together so as to form an output data set , wherein the reducing is accomplished by iterating on at least one key from the first and second intermediate data set , and the output data set has a different schema than the first and second schema , wherein the first and second schema and the first and second set of resulting key-value pairs have a key in common .
JP2004038313A
CLAIM 1
所定の処理を行う関数を備えるプログラムの実行中のログを取得するログ取得方法であって、 ロードされた前記所定の処理を行う関数のアドレスを、ログ取得のための関数のアドレスに書き換える工程と、 前記所定の処理を行う関数を選択する工程と、を備え、 前記ログ取得のための関数は、 前記所定の処理を行う関数を呼び出し、該所定の処理を実行させ、受け取った実行結果を前記プログラムに渡す工程と、 前記選択する工程において選択された関数を呼び出す際の所定の情報をログとして記録する工程と、 前記選択する工程において選択された関数の実行結果を受け取った際の所定の情報をログとして記録する工程と を備えること (data group, first data group) を特徴とするログ取得方法。

JP2004038313A
CLAIM 13
所定の処理を行う関数を備えるプログラムの実行中のログを取得するログ取得方法であって、 ロードされた前記所定の処理を行う関数のアドレスを、ログ取得のための関数のアドレスに書き換える工程と、 前記所定の処理を行う関数を選択する工程と、を備え、 前記ログ取得のための関数は、 前記所定の処理を行う関数を呼び出し、該所定の処理を実行させ、受け取った実行結果を前記プログラムに渡す工程と、 前記所定の処理を行う関数を呼び出す際の所定の情報と、前記実行結果を受け取った際の所定の情報とをログとして記録する工程と、を備え、 前記ログとして記録する工程は、ログのサイズが一定サイズ以上になった場合に、新たに別ファイル (second intermediate data) を作成するとともに、作成日時の古いログを削除することを特徴とするログ取得方法。

US8190610B2
CLAIM 41
. The computer system of claim 40 , wherein the reducing is accomplished by iterating on at least one key from each of the first and second intermediate data (ファイル) set so that the output data set is a merging of a portion of the first and second intermediate data set .
JP2004038313A
CLAIM 13
所定の処理を行う関数を備えるプログラムの実行中のログを取得するログ取得方法であって、 ロードされた前記所定の処理を行う関数のアドレスを、ログ取得のための関数のアドレスに書き換える工程と、 前記所定の処理を行う関数を選択する工程と、を備え、 前記ログ取得のための関数は、 前記所定の処理を行う関数を呼び出し、該所定の処理を実行させ、受け取った実行結果を前記プログラムに渡す工程と、 前記所定の処理を行う関数を呼び出す際の所定の情報と、前記実行結果を受け取った際の所定の情報とをログとして記録する工程と、を備え、 前記ログとして記録する工程は、ログのサイズが一定サイズ以上になった場合に、新たに別ファイル (second intermediate data) を作成するとともに、作成日時の古いログを削除することを特徴とするログ取得方法。

US8190610B2
CLAIM 42
. The computer system of claim 40 , wherein the reducing is accomplished by a plurality of reducers , the d the at least one processor and memory further operable for : partitioning the first or second intermediate data (ファイル) set into a plurality of partitions , at least some of the key-value pairs of the first or second intermediate data being provided to more than one of the partitions ;

and providing the partitioned intermediate data set of each partition to a separate one of the reducers .
JP2004038313A
CLAIM 13
所定の処理を行う関数を備えるプログラムの実行中のログを取得するログ取得方法であって、 ロードされた前記所定の処理を行う関数のアドレスを、ログ取得のための関数のアドレスに書き換える工程と、 前記所定の処理を行う関数を選択する工程と、を備え、 前記ログ取得のための関数は、 前記所定の処理を行う関数を呼び出し、該所定の処理を実行させ、受け取った実行結果を前記プログラムに渡す工程と、 前記所定の処理を行う関数を呼び出す際の所定の情報と、前記実行結果を受け取った際の所定の情報とをログとして記録する工程と、を備え、 前記ログとして記録する工程は、ログのサイズが一定サイズ以上になった場合に、新たに別ファイル (second intermediate data) を作成するとともに、作成日時の古いログを削除することを特徴とするログ取得方法。

US8190610B2
CLAIM 43
. The computer system of claim 42 , wherein all of the key-value pairs of the first and second intermediate data (ファイル) set are provided to all of the reducers .
JP2004038313A
CLAIM 13
所定の処理を行う関数を備えるプログラムの実行中のログを取得するログ取得方法であって、 ロードされた前記所定の処理を行う関数のアドレスを、ログ取得のための関数のアドレスに書き換える工程と、 前記所定の処理を行う関数を選択する工程と、を備え、 前記ログ取得のための関数は、 前記所定の処理を行う関数を呼び出し、該所定の処理を実行させ、受け取った実行結果を前記プログラムに渡す工程と、 前記所定の処理を行う関数を呼び出す際の所定の情報と、前記実行結果を受け取った際の所定の情報とをログとして記録する工程と、を備え、 前記ログとして記録する工程は、ログのサイズが一定サイズ以上になった場合に、新たに別ファイル (second intermediate data) を作成するとともに、作成日時の古いログを削除することを特徴とするログ取得方法。




US8190610B2

Filed: 2006-10-05     Issued: 2012-05-29

MapReduce for distributed database processing

(Original Assignee) Yahoo Inc     (Current Assignee) R2 Solutions LLC ; Altaba Inc

Ali Dasdan, Hung-Chih Yang, Ruey-Lung Hsiao
US20040003086A1

Filed: 2002-06-28     Issued: 2004-01-01

Re-partitioning directories

(Original Assignee) Microsoft Corp     (Current Assignee) Microsoft Technology Licensing LLC

Jeffrey Parham, Mark Brown
US8190610B2
CLAIM 1
. A method of processing data of a data set over a distributed system , wherein the data set comprises a plurality of data groups , the method comprising : partitioning the data of each one of the data groups into a plurality of data partitions that each have a plurality of key-value pairs and providing each data partition to a selected one of a plurality of mapping functions that are each user-configurable to independently output a plurality of lists of values for each of a set of keys found in such map function' ;

s corresponding data partition to form corresponding intermediate data for that data group and identifiable to that data group , wherein the data of a first data (communicatively couple) group has a different schema than the data of a second data group and the data of the first data group is mapped differently than the data of the second data group so that different lists of values are output for the corresponding different intermediate data , wherein the different schema and corresponding different intermediate data have a key in common ;

and reducing the intermediate data for the data groups to at least one output data group , including processing the intermediate data for each data group in a manner that is defined to correspond to that data group , so as to result in a merging of the corresponding different intermediate data based on the key in common , wherein the mapping and reducing operations are performed by a distributed system .
US20040003086A1
CLAIM 1
. A method for re-partitioning directory servers supporting user service (first data group) s in a site receiving access requests from users , the site comprising the directory servers , where at least a first directory server stores directory objects in categorical groups such that the directory objects in each group share an attribute , and a management server communicatively couple (first data, first data set, output data set, includes data) d to the directory servers having a table storing information identifying a location for each group , the method comprising the steps of : identifying a group of directory objects in one of the directory servers for migration to another directory server ;
selecting a second directory server capable of storing the identified group of directory objects ;
transferring the identified group of directory objects from the first directory server to the second directory server ;
updating the location information in the table to indicate the identified group of directory objects is located at the second directory server ;
and deleting the identified group of directory objects from the first directory server .

US8190610B2
CLAIM 14
. The method of claim 13 , wherein : the reducing step is carried out by a plurality of reducers ;

and the data that is not intermediate data , for a particular reducer , includes data (communicatively couple) that is associated with another reducer .
US20040003086A1
CLAIM 1
. A method for re-partitioning directory servers supporting user services in a site receiving access requests from users , the site comprising the directory servers , where at least a first directory server stores directory objects in categorical groups such that the directory objects in each group share an attribute , and a management server communicatively couple (first data, first data set, output data set, includes data) d to the directory servers having a table storing information identifying a location for each group , the method comprising the steps of : identifying a group of directory objects in one of the directory servers for migration to another directory server ;
selecting a second directory server capable of storing the identified group of directory objects ;
transferring the identified group of directory objects from the first directory server to the second directory server ;
updating the location information in the table to indicate the identified group of directory objects is located at the second directory server ;
and deleting the identified group of directory objects from the first directory server .

US8190610B2
CLAIM 15
. The method of claim 13 , wherein : the reducing step is carried out by a plurality of reducers ;

and the data that is not intermediate data , for a particular reducer , includes data (communicatively couple) that is associated with that reducer .
US20040003086A1
CLAIM 1
. A method for re-partitioning directory servers supporting user services in a site receiving access requests from users , the site comprising the directory servers , where at least a first directory server stores directory objects in categorical groups such that the directory objects in each group share an attribute , and a management server communicatively couple (first data, first data set, output data set, includes data) d to the directory servers having a table storing information identifying a location for each group , the method comprising the steps of : identifying a group of directory objects in one of the directory servers for migration to another directory server ;
selecting a second directory server capable of storing the identified group of directory objects ;
transferring the identified group of directory objects from the first directory server to the second directory server ;
updating the location information in the table to indicate the identified group of directory objects is located at the second directory server ;
and deleting the identified group of directory objects from the first directory server .

US8190610B2
CLAIM 17
. A computer system including a plurality of computing devices , the computer system configured to process data of a data set , wherein the data set comprises a plurality of data groups , the computer system comprises at least one processor and memory that are operable to perform the following operations : partitioning the data of each one of the data groups into a plurality of data partitions that each have a plurality of key-value pairs and providing each data partition to a selected one of a plurality of mapping functions that are each user-configurable to independently output a plurality of lists of values for each of a set of keys found in such map function' ;

s corresponding data partition to form corresponding intermediate data for that data group and identifiable to that data group , wherein the data of a first data (communicatively couple) group has a different schema than the data of a second data group and the data of the first data group is mapped differently than the data of the second data group so that different lists of values are output for the corresponding different intermediate data , wherein the different schema and corresponding different intermediate data have a key in common ;

and reduce the intermediate data for the data groups to at least one output data group , including processing the intermediate data for each data group in a manner that is defined to correspond to that data group so as to result in a merging of the corresponding different intermediate data based on the key in common .
US20040003086A1
CLAIM 1
. A method for re-partitioning directory servers supporting user service (first data group) s in a site receiving access requests from users , the site comprising the directory servers , where at least a first directory server stores directory objects in categorical groups such that the directory objects in each group share an attribute , and a management server communicatively couple (first data, first data set, output data set, includes data) d to the directory servers having a table storing information identifying a location for each group , the method comprising the steps of : identifying a group of directory objects in one of the directory servers for migration to another directory server ;
selecting a second directory server capable of storing the identified group of directory objects ;
transferring the identified group of directory objects from the first directory server to the second directory server ;
updating the location information in the table to indicate the identified group of directory objects is located at the second directory server ;
and deleting the identified group of directory objects from the first directory server .

US8190610B2
CLAIM 30
. The computer system of claim 29 , wherein : the at least one processor and memory are further operable to reduce the intermediate data via a plurality of reducers ;

and the data that is not intermediate data , for a particular reducer , includes data (communicatively couple) that is associated with another reducer .
US20040003086A1
CLAIM 1
. A method for re-partitioning directory servers supporting user services in a site receiving access requests from users , the site comprising the directory servers , where at least a first directory server stores directory objects in categorical groups such that the directory objects in each group share an attribute , and a management server communicatively couple (first data, first data set, output data set, includes data) d to the directory servers having a table storing information identifying a location for each group , the method comprising the steps of : identifying a group of directory objects in one of the directory servers for migration to another directory server ;
selecting a second directory server capable of storing the identified group of directory objects ;
transferring the identified group of directory objects from the first directory server to the second directory server ;
updating the location information in the table to indicate the identified group of directory objects is located at the second directory server ;
and deleting the identified group of directory objects from the first directory server .

US8190610B2
CLAIM 31
. The computer system of claim 29 , wherein : the at least one processor and memory are further operable to reduce the intermediate data via a plurality of reducers ;

and the data that is not intermediate data , for a particular reducer , includes data (communicatively couple) that is associated with that reducer .
US20040003086A1
CLAIM 1
. A method for re-partitioning directory servers supporting user services in a site receiving access requests from users , the site comprising the directory servers , where at least a first directory server stores directory objects in categorical groups such that the directory objects in each group share an attribute , and a management server communicatively couple (first data, first data set, output data set, includes data) d to the directory servers having a table storing information identifying a location for each group , the method comprising the steps of : identifying a group of directory objects in one of the directory servers for migration to another directory server ;
selecting a second directory server capable of storing the identified group of directory objects ;
transferring the identified group of directory objects from the first directory server to the second directory server ;
updating the location information in the table to indicate the identified group of directory objects is located at the second directory server ;
and deleting the identified group of directory objects from the first directory server .

US8190610B2
CLAIM 33
. A map-reduce method of processing data from a plurality of groups having different schema over a computer system , the method comprising : for a first data (communicatively couple) set having a plurality of first key-value pairs , wherein such first data set belongs to a first data group and the first key-value pairs have a first schema , partitioning the first data set into a plurality of data partitions and providing each data partition to a selected one of a plurality of mapping functions that are each user-configurable to independently output a plurality of lists of values for each of a set of keys found in such map function' ;

s corresponding data partition to form a first intermediate data set having a first set of resulting key-value pairs ;

for a second data set having a plurality of second key-value pairs , wherein such second data set belongs to a second data group and the second key-value pairs have a second schema , partitioning the first data set into a plurality of data partitions and providing each data partition to a selected one of a plurality of mapping functions to form a second intermediate data set having a second set of resulting key-value pairs differing from the first set of resulting key-value pairs , wherein the first schema differs from the second schema , wherein the first and second schema and the first and second set of resulting key-value pairs have a key in common ;

and reducing the first and the second intermediate data set together so as to form an output data set (communicatively couple) , wherein the reducing is accomplished by iterating on at least one key from the first and second intermediate data set , and the output data set has a different schema than the first and second schema , wherein the mapping and reducing operations are performed by a distributed system .
US20040003086A1
CLAIM 1
. A method for re-partitioning directory servers supporting user service (first data group) s in a site receiving access requests from users , the site comprising the directory servers , where at least a first directory server stores directory objects in categorical groups such that the directory objects in each group share an attribute , and a management server communicatively couple (first data, first data set, output data set, includes data) d to the directory servers having a table storing information identifying a location for each group , the method comprising the steps of : identifying a group of directory objects in one of the directory servers for migration to another directory server ;
selecting a second directory server capable of storing the identified group of directory objects ;
transferring the identified group of directory objects from the first directory server to the second directory server ;
updating the location information in the table to indicate the identified group of directory objects is located at the second directory server ;
and deleting the identified group of directory objects from the first directory server .

US8190610B2
CLAIM 34
. The map-reduce method of claim 33 , wherein the reducing is accomplished by iterating on at least one key from each of the first and second intermediate data set so that the output data set (communicatively couple) is a merging of a portion of the first and second intermediate data set .
US20040003086A1
CLAIM 1
. A method for re-partitioning directory servers supporting user services in a site receiving access requests from users , the site comprising the directory servers , where at least a first directory server stores directory objects in categorical groups such that the directory objects in each group share an attribute , and a management server communicatively couple (first data, first data set, output data set, includes data) d to the directory servers having a table storing information identifying a location for each group , the method comprising the steps of : identifying a group of directory objects in one of the directory servers for migration to another directory server ;
selecting a second directory server capable of storing the identified group of directory objects ;
transferring the identified group of directory objects from the first directory server to the second directory server ;
updating the location information in the table to indicate the identified group of directory objects is located at the second directory server ;
and deleting the identified group of directory objects from the first directory server .

US8190610B2
CLAIM 40
. A computer system including a plurality of computing devices , the computer system configured to process data of a data set , wherein the data set comprises a plurality of data groups , the computer system comprises at least one processor and memory that are operable to perform the following operations : for a first data (communicatively couple) set having a plurality of first key-value pairs , wherein such first data set belongs to a first data group and the first key-value pairs have a first schema , partitioning the first data set into a plurality of data partitions and providing each data partition to a selected one of a plurality of mapping functions that are each user-configurable to independently output a plurality of lists of values for each of a set of keys found in such map function' ;

s corresponding data partition to form a first intermediate data set having a first set of resulting key-value pairs ;

for a second data set having a plurality of second key-value pairs , wherein such second data set belongs to a second data group and the second key-value pairs have a second schema , partitioning the first data set into a plurality of data partitions and providing each data partition to a selected one of a plurality of mapping functions to form a second intermediate data set having a second set of resulting key-value pairs differing from the first set of resulting key-value pairs , wherein the first schema differs from the second schema ;

and reducing the first and the second intermediate data set together so as to form an output data set (communicatively couple) , wherein the reducing is accomplished by iterating on at least one key from the first and second intermediate data set , and the output data set has a different schema than the first and second schema , wherein the first and second schema and the first and second set of resulting key-value pairs have a key in common .
US20040003086A1
CLAIM 1
. A method for re-partitioning directory servers supporting user service (first data group) s in a site receiving access requests from users , the site comprising the directory servers , where at least a first directory server stores directory objects in categorical groups such that the directory objects in each group share an attribute , and a management server communicatively couple (first data, first data set, output data set, includes data) d to the directory servers having a table storing information identifying a location for each group , the method comprising the steps of : identifying a group of directory objects in one of the directory servers for migration to another directory server ;
selecting a second directory server capable of storing the identified group of directory objects ;
transferring the identified group of directory objects from the first directory server to the second directory server ;
updating the location information in the table to indicate the identified group of directory objects is located at the second directory server ;
and deleting the identified group of directory objects from the first directory server .

US8190610B2
CLAIM 41
. The computer system of claim 40 , wherein the reducing is accomplished by iterating on at least one key from each of the first and second intermediate data set so that the output data set (communicatively couple) is a merging of a portion of the first and second intermediate data set .
US20040003086A1
CLAIM 1
. A method for re-partitioning directory servers supporting user services in a site receiving access requests from users , the site comprising the directory servers , where at least a first directory server stores directory objects in categorical groups such that the directory objects in each group share an attribute , and a management server communicatively couple (first data, first data set, output data set, includes data) d to the directory servers having a table storing information identifying a location for each group , the method comprising the steps of : identifying a group of directory objects in one of the directory servers for migration to another directory server ;
selecting a second directory server capable of storing the identified group of directory objects ;
transferring the identified group of directory objects from the first directory server to the second directory server ;
updating the location information in the table to indicate the identified group of directory objects is located at the second directory server ;
and deleting the identified group of directory objects from the first directory server .




US8190610B2

Filed: 2006-10-05     Issued: 2012-05-29

MapReduce for distributed database processing

(Original Assignee) Yahoo Inc     (Current Assignee) R2 Solutions LLC ; Altaba Inc

Ali Dasdan, Hung-Chih Yang, Ruey-Lung Hsiao
US20030217033A1

Filed: 2002-05-17     Issued: 2003-11-20

Database system and methods

(Original Assignee) Aleri Inc     (Current Assignee) Sybase Inc

Zigmund Sandler, Vladimir Seroff, Jon Riecke, Scott Kolodzieski
US8190610B2
CLAIM 1
. A method of processing data of a data set over a distributed system , wherein the data set comprises a plurality of data groups , the method comprising : partitioning the data of each one of the data groups into a plurality of data partitions that each have a plurality of key-value pairs and providing each data partition to a selected one of a plurality of mapping functions that are each user-configurable to independently output a plurality of lists of values for each of a set of keys found in such map function' ;

s corresponding data partition to form corresponding intermediate data for that data group and identifiable to that data group , wherein the data of a first data (first data) group has a different schema than the data of a second data (second data) group and the data of the first data group is mapped differently than the data of the second data group so that different lists of values are output for the corresponding different intermediate data (second minimum) , wherein the different schema and corresponding different intermediate data have a key in common ;

and reducing the intermediate data for the data groups to at least one output data group , including processing the intermediate data for each data group in a manner that is defined to correspond to that data group , so as to result in a merging of the corresponding different intermediate data based on the key in common , wherein the mapping and reducing operations (receiving step) are performed by a distributed system .
US20030217033A1
CLAIM 5
. The method of claim 1 wherein the receiving step (reducing operations) , the generating step , and the modifying step are performed by a transaction subsystem .

US20030217033A1
CLAIM 14
. The system of claim 12 further comprising a second minimum (different intermediate data, different key) recalculation engine for recalculating portions of database tables in response to the logged collected data and the metadata .

US20030217033A1
CLAIM 18
. The system of claim 17 wherein the system further comprises a second data (second data) base storing data , and the transaction processing system updates the first data (first data) base and the second database .

US8190610B2
CLAIM 8
. The method of claim 7 , wherein : in the partitioning step (second resource) , all of the key/value pairs of the intermediate data are provided to all of the partitions .
US20030217033A1
CLAIM 13
. The system of claim 12 further comprising a second resource (partitioning step) manager in communication with a second adapter listener for receiving transaction data from a second queued transaction data source and for logging collected data .

US8190610B2
CLAIM 12
. The method of claim 5 , wherein : the intermediate data includes a plurality of grouped sets of key/value pairs ;

the reducing step is carried out by a plurality of reducers ;

for at least one of the reducers , the iterator corresponding to a particular data group , for that reducer , operates according to a different key (second minimum) of a different schema than the iterator corresponding to another particular data group , for that reducer .
US20030217033A1
CLAIM 14
. The system of claim 12 further comprising a second minimum (different intermediate data, different key) recalculation engine for recalculating portions of database tables in response to the logged collected data and the metadata .

US8190610B2
CLAIM 17
. A computer system including a plurality of computing devices , the computer system configured to process data of a data set , wherein the data set comprises a plurality of data groups , the computer system comprises at least one processor and memory that are operable to perform the following operations : partitioning the data of each one of the data groups into a plurality of data partitions that each have a plurality of key-value pairs and providing each data partition to a selected one of a plurality of mapping functions that are each user-configurable to independently output a plurality of lists of values for each of a set of keys found in such map function' ;

s corresponding data partition to form corresponding intermediate data for that data group and identifiable to that data group , wherein the data of a first data (first data) group has a different schema than the data of a second data (second data) group and the data of the first data group is mapped differently than the data of the second data group so that different lists of values are output for the corresponding different intermediate data (second minimum) , wherein the different schema and corresponding different intermediate data have a key in common ;

and reduce the intermediate data for the data groups to at least one output data group , including processing the intermediate data for each data group in a manner that is defined to correspond to that data group so as to result in a merging of the corresponding different intermediate data based on the key in common .
US20030217033A1
CLAIM 14
. The system of claim 12 further comprising a second minimum (different intermediate data, different key) recalculation engine for recalculating portions of database tables in response to the logged collected data and the metadata .

US20030217033A1
CLAIM 18
. The system of claim 17 wherein the system further comprises a second data (second data) base storing data , and the transaction processing system updates the first data (first data) base and the second database .

US8190610B2
CLAIM 22
. The computer system of claim 21 , wherein : the intermediate data includes a plurality of grouped sets of key/value pairs ;

the at least one processor and memory are further operable to reduce the intermediate data via a plurality of reducers ;

for at least one of the reducers , the iterator corresponding to a particular data group , for that reducer , is configured to operate according to a different key (second minimum) of a different schema than the iterator corresponding to another particular data group , for that reducer .
US20030217033A1
CLAIM 14
. The system of claim 12 further comprising a second minimum (different intermediate data, different key) recalculation engine for recalculating portions of database tables in response to the logged collected data and the metadata .

US8190610B2
CLAIM 33
. A map-reduce method of processing data from a plurality of groups having different schema over a computer system , the method comprising : for a first data (first data) set having a plurality of first key-value pairs , wherein such first data set belongs to a first data group and the first key-value pairs have a first schema , partitioning the first data set into a plurality of data partitions and providing each data partition to a selected one of a plurality of mapping functions that are each user-configurable to independently output a plurality of lists of values for each of a set of keys found in such map function' ;

s corresponding data partition to form a first intermediate data set having a first set of resulting key-value pairs ;

for a second data (second data) set having a plurality of second key-value pairs , wherein such second data set belongs to a second data group and the second key-value pairs have a second schema , partitioning the first data set into a plurality of data partitions and providing each data partition to a selected one of a plurality of mapping functions to form a second intermediate data set having a second set of resulting key-value pairs differing from the first set of resulting key-value pairs , wherein the first schema differs from the second schema , wherein the first and second schema and the first and second set of resulting key-value pairs have a key in common ;

and reducing the first and the second intermediate data set together so as to form an output data set , wherein the reducing is accomplished by iterating on at least one key from the first and second intermediate data set , and the output data set has a different schema than the first and second schema , wherein the mapping and reducing operations (receiving step) are performed by a distributed system .
US20030217033A1
CLAIM 5
. The method of claim 1 wherein the receiving step (reducing operations) , the generating step , and the modifying step are performed by a transaction subsystem .

US20030217033A1
CLAIM 18
. The system of claim 17 wherein the system further comprises a second data (second data) base storing data , and the transaction processing system updates the first data (first data) base and the second database .

US8190610B2
CLAIM 40
. A computer system including a plurality of computing devices , the computer system configured to process data of a data set , wherein the data set comprises a plurality of data groups , the computer system comprises at least one processor and memory that are operable to perform the following operations : for a first data (first data) set having a plurality of first key-value pairs , wherein such first data set belongs to a first data group and the first key-value pairs have a first schema , partitioning the first data set into a plurality of data partitions and providing each data partition to a selected one of a plurality of mapping functions that are each user-configurable to independently output a plurality of lists of values for each of a set of keys found in such map function' ;

s corresponding data partition to form a first intermediate data set having a first set of resulting key-value pairs ;

for a second data (second data) set having a plurality of second key-value pairs , wherein such second data set belongs to a second data group and the second key-value pairs have a second schema , partitioning the first data set into a plurality of data partitions and providing each data partition to a selected one of a plurality of mapping functions to form a second intermediate data set having a second set of resulting key-value pairs differing from the first set of resulting key-value pairs , wherein the first schema differs from the second schema ;

and reducing the first and the second intermediate data set together so as to form an output data set , wherein the reducing is accomplished by iterating on at least one key from the first and second intermediate data set , and the output data set has a different schema than the first and second schema , wherein the first and second schema and the first and second set of resulting key-value pairs have a key in common .
US20030217033A1
CLAIM 18
. The system of claim 17 wherein the system further comprises a second data (second data) base storing data , and the transaction processing system updates the first data (first data) base and the second database .




US8190610B2

Filed: 2006-10-05     Issued: 2012-05-29

MapReduce for distributed database processing

(Original Assignee) Yahoo Inc     (Current Assignee) R2 Solutions LLC ; Altaba Inc

Ali Dasdan, Hung-Chih Yang, Ruey-Lung Hsiao
US20030212664A1

Filed: 2002-05-10     Issued: 2003-11-13

Querying markup language data sources using a relational query processor

(Original Assignee) International Business Machines Corp     (Current Assignee) International Business Machines Corp

Martin Breining, Vanja Josifovski, Peter Schwarz
US8190610B2
CLAIM 1
. A method of processing data of a data set over a distributed system , wherein the data set comprises a plurality of data groups , the method comprising : partitioning the data of each one of the data groups into a plurality of data partitions that each have a plurality of key-value pairs and providing each data partition to a selected one of a plurality of mapping functions (data definition) that are each user-configurable to independently output a plurality of lists of values for each of a set of keys found in such map function' ;

s corresponding data partition to form corresponding intermediate data for that data group and identifiable to that data group , wherein the data of a first data (second program, first program) group has a different schema (Extensible Markup Language) than the data of a second data group and the data of the first data group is mapped differently than the data of the second data group so that different lists of values are output for the corresponding different intermediate data , wherein the different schema and corresponding different intermediate data have a key in common ;

and reducing the intermediate data for the data groups to at least one output data group , including processing the intermediate data for each data group in a manner that is defined to correspond to that data group , so as to result in a merging of the corresponding different intermediate data based on the key in common , wherein the mapping and reducing operations are performed by a distributed system .
US20030212664A1
CLAIM 8
. The method of claim 7 , wherein the document is an Extensible Markup Language (different schema) (XML) document and the expression is an XPath expression .

US20030212664A1
CLAIM 9
. The method of claim 8 , wherein the mapping specification contains a data definition (mapping functions) language statement with an option specifying the XPath expression .

US20030212664A1
CLAIM 11
. A computer-readable medium of instructions for execution by a computer and suitable for specifying a mapping of a plurality of nodes contained in a mark-up language document according to a relational schema into a plurality of tables , comprising : first program (first data) instructions for creating a first table based on a first type of node from the document , the first instructions including an option specifying a location within the mark-up language document of the first type of node ;
and second program (first data) instructions for creating a second table based on a second type of node from the document , the second type of node related to the first type of node , and the second program instructions including an option specifying a location within the mark-up language document of the second type of node .

US8190610B2
CLAIM 12
. The method of claim 5 , wherein : the intermediate data includes a plurality of grouped sets of key/value pairs ;

the reducing step is carried out by a plurality of reducers ;

for at least one of the reducers , the iterator corresponding to a particular data group , for that reducer , operates according to a different key of a different schema (Extensible Markup Language) than the iterator corresponding to another particular data group , for that reducer .
US20030212664A1
CLAIM 8
. The method of claim 7 , wherein the document is an Extensible Markup Language (different schema) (XML) document and the expression is an XPath expression .

US8190610B2
CLAIM 17
. A computer system including a plurality of computing devices , the computer system configured to process data of a data set , wherein the data set comprises a plurality of data groups , the computer system comprises at least one processor and memory that are operable to perform the following operations : partitioning the data of each one of the data groups into a plurality of data partitions that each have a plurality of key-value pairs and providing each data partition to a selected one of a plurality of mapping functions (data definition) that are each user-configurable to independently output a plurality of lists of values for each of a set of keys found in such map function' ;

s corresponding data partition to form corresponding intermediate data for that data group and identifiable to that data group , wherein the data of a first data (second program, first program) group has a different schema (Extensible Markup Language) than the data of a second data group and the data of the first data group is mapped differently than the data of the second data group so that different lists of values are output for the corresponding different intermediate data , wherein the different schema and corresponding different intermediate data have a key in common ;

and reduce the intermediate data for the data groups to at least one output data group , including processing the intermediate data for each data group in a manner that is defined to correspond to that data group so as to result in a merging of the corresponding different intermediate data based on the key in common .
US20030212664A1
CLAIM 8
. The method of claim 7 , wherein the document is an Extensible Markup Language (different schema) (XML) document and the expression is an XPath expression .

US20030212664A1
CLAIM 9
. The method of claim 8 , wherein the mapping specification contains a data definition (mapping functions) language statement with an option specifying the XPath expression .

US20030212664A1
CLAIM 11
. A computer-readable medium of instructions for execution by a computer and suitable for specifying a mapping of a plurality of nodes contained in a mark-up language document according to a relational schema into a plurality of tables , comprising : first program (first data) instructions for creating a first table based on a first type of node from the document , the first instructions including an option specifying a location within the mark-up language document of the first type of node ;
and second program (first data) instructions for creating a second table based on a second type of node from the document , the second type of node related to the first type of node , and the second program instructions including an option specifying a location within the mark-up language document of the second type of node .

US8190610B2
CLAIM 22
. The computer system of claim 21 , wherein : the intermediate data includes a plurality of grouped sets of key/value pairs ;

the at least one processor and memory are further operable to reduce the intermediate data via a plurality of reducers ;

for at least one of the reducers , the iterator corresponding to a particular data group , for that reducer , is configured to operate according to a different key of a different schema (Extensible Markup Language) than the iterator corresponding to another particular data group , for that reducer .
US20030212664A1
CLAIM 8
. The method of claim 7 , wherein the document is an Extensible Markup Language (different schema) (XML) document and the expression is an XPath expression .

US8190610B2
CLAIM 33
. A map-reduce method of processing data from a plurality of groups having different schema (Extensible Markup Language) over a computer system , the method comprising : for a first data (second program, first program) set having a plurality of first key-value pairs , wherein such first data set belongs to a first data group and the first key-value pairs have a first schema , partitioning the first data set into a plurality of data partitions and providing each data partition to a selected one of a plurality of mapping functions (data definition) that are each user-configurable to independently output a plurality of lists of values for each of a set of keys found in such map function' ;

s corresponding data partition to form a first intermediate data set having a first set (first type) of resulting key-value pairs ;

for a second data set having a plurality of second key-value pairs , wherein such second data set belongs to a second data group and the second key-value pairs have a second schema , partitioning the first data set into a plurality of data partitions and providing each data partition to a selected one of a plurality of mapping functions to form a second intermediate data set having a second set of resulting key-value pairs differing from the first set of resulting key-value pairs , wherein the first schema differs from the second schema , wherein the first and second schema and the first and second set of resulting key-value pairs have a key in common ;

and reducing the first and the second intermediate data set together so as to form an output data set , wherein the reducing is accomplished by iterating on at least one key from the first and second intermediate data set , and the output data set has a different schema than the first and second schema , wherein the mapping and reducing operations are performed by a distributed system .
US20030212664A1
CLAIM 8
. The method of claim 7 , wherein the document is an Extensible Markup Language (different schema) (XML) document and the expression is an XPath expression .

US20030212664A1
CLAIM 9
. The method of claim 8 , wherein the mapping specification contains a data definition (mapping functions) language statement with an option specifying the XPath expression .

US20030212664A1
CLAIM 11
. A computer-readable medium of instructions for execution by a computer and suitable for specifying a mapping of a plurality of nodes contained in a mark-up language document according to a relational schema into a plurality of tables , comprising : first program (first data) instructions for creating a first table based on a first type (first set) of node from the document , the first instructions including an option specifying a location within the mark-up language document of the first type of node ;
and second program (first data) instructions for creating a second table based on a second type of node from the document , the second type of node related to the first type of node , and the second program instructions including an option specifying a location within the mark-up language document of the second type of node .

US8190610B2
CLAIM 40
. A computer system including a plurality of computing devices , the computer system configured to process data of a data set , wherein the data set comprises a plurality of data groups , the computer system comprises at least one processor and memory that are operable to perform the following operations : for a first data (second program, first program) set having a plurality of first key-value pairs , wherein such first data set belongs to a first data group and the first key-value pairs have a first schema , partitioning the first data set into a plurality of data partitions and providing each data partition to a selected one of a plurality of mapping functions (data definition) that are each user-configurable to independently output a plurality of lists of values for each of a set of keys found in such map function' ;

s corresponding data partition to form a first intermediate data set having a first set (first type) of resulting key-value pairs ;

for a second data set having a plurality of second key-value pairs , wherein such second data set belongs to a second data group and the second key-value pairs have a second schema , partitioning the first data set into a plurality of data partitions and providing each data partition to a selected one of a plurality of mapping functions to form a second intermediate data set having a second set of resulting key-value pairs differing from the first set of resulting key-value pairs , wherein the first schema differs from the second schema ;

and reducing the first and the second intermediate data set together so as to form an output data set , wherein the reducing is accomplished by iterating on at least one key from the first and second intermediate data set , and the output data set has a different schema (Extensible Markup Language) than the first and second schema , wherein the first and second schema and the first and second set of resulting key-value pairs have a key in common .
US20030212664A1
CLAIM 8
. The method of claim 7 , wherein the document is an Extensible Markup Language (different schema) (XML) document and the expression is an XPath expression .

US20030212664A1
CLAIM 9
. The method of claim 8 , wherein the mapping specification contains a data definition (mapping functions) language statement with an option specifying the XPath expression .

US20030212664A1
CLAIM 11
. A computer-readable medium of instructions for execution by a computer and suitable for specifying a mapping of a plurality of nodes contained in a mark-up language document according to a relational schema into a plurality of tables , comprising : first program (first data) instructions for creating a first table based on a first type (first set) of node from the document , the first instructions including an option specifying a location within the mark-up language document of the first type of node ;
and second program (first data) instructions for creating a second table based on a second type of node from the document , the second type of node related to the first type of node , and the second program instructions including an option specifying a location within the mark-up language document of the second type of node .




US8190610B2

Filed: 2006-10-05     Issued: 2012-05-29

MapReduce for distributed database processing

(Original Assignee) Yahoo Inc     (Current Assignee) R2 Solutions LLC ; Altaba Inc

Ali Dasdan, Hung-Chih Yang, Ruey-Lung Hsiao
EP1227396A1

Filed: 2002-01-15     Issued: 2002-07-31

A method, system and computer program product for synchronizing data represented by different data structures by using update notifications

(Original Assignee) Microsoft Corp     (Current Assignee) Microsoft Corp

Donald J. Kadyk, Neil S. Fishman, Marc E. Seinfield
US8190610B2
CLAIM 1
. A method of processing data of a data set over a distributed system , wherein the data set comprises a plurality of data groups , the method comprising : partitioning the data of each one of the data groups into a plurality of data partitions that each have a plurality of key-value pairs and providing each data partition to a selected one of a plurality of mapping functions that are each user-configurable to independently output a plurality of lists of values for each of a set of keys found in such map function' ;

s corresponding data partition to form corresponding intermediate data for that data group and identifiable to that data group , wherein the data of a first data (first data) group has a different schema than the data of a second data (second data) group and the data of the first data group is mapped differently than the data of the second data group so that different lists of values are output for the corresponding different intermediate data , wherein the different schema and corresponding different intermediate data have a key in common ;

and reducing the intermediate data for the data groups to at least one output data group , including processing the intermediate data for each data group in a manner that is defined to correspond to that data group , so as to result in a merging of the corresponding different intermediate data based on the key in common , wherein the mapping and reducing operations are performed by a distributed system .
EP1227396A1
CLAIM 1
In an environment that includes a first device storing first data (first data) and a second device storing second data (second data) , a method of synchronizing the second data with the first data , while accounting for one or more update notifications that either may or may not have been received by the second device and while accounting for any differences in how the first device and second device store data , the method comprising : an act of making a change in the first data ;
an act of sending a notification to the second device , the notification including both the change and a token identifying the change ;
an act of receiving a synchronization request from the second device ;
and an act of resending the change to the second device if the synchronization request does not include the token .

US8190610B2
CLAIM 17
. A computer system including a plurality of computing devices , the computer system configured to process data of a data set , wherein the data set comprises a plurality of data groups , the computer system comprises at least one processor and memory that are operable to perform the following operations : partitioning the data of each one of the data groups into a plurality of data partitions that each have a plurality of key-value pairs and providing each data partition to a selected one of a plurality of mapping functions that are each user-configurable to independently output a plurality of lists of values for each of a set of keys found in such map function' ;

s corresponding data partition to form corresponding intermediate data for that data group and identifiable to that data group , wherein the data of a first data (first data) group has a different schema than the data of a second data (second data) group and the data of the first data group is mapped differently than the data of the second data group so that different lists of values are output for the corresponding different intermediate data , wherein the different schema and corresponding different intermediate data have a key in common ;

and reduce the intermediate data for the data groups to at least one output data group , including processing the intermediate data for each data group in a manner that is defined to correspond to that data group so as to result in a merging of the corresponding different intermediate data based on the key in common .
EP1227396A1
CLAIM 1
In an environment that includes a first device storing first data (first data) and a second device storing second data (second data) , a method of synchronizing the second data with the first data , while accounting for one or more update notifications that either may or may not have been received by the second device and while accounting for any differences in how the first device and second device store data , the method comprising : an act of making a change in the first data ;
an act of sending a notification to the second device , the notification including both the change and a token identifying the change ;
an act of receiving a synchronization request from the second device ;
and an act of resending the change to the second device if the synchronization request does not include the token .

US8190610B2
CLAIM 33
. A map-reduce method of processing data from a plurality of groups having different schema over a computer system , the method comprising : for a first data (first data) set having a plurality of first key-value pairs , wherein such first data set belongs to a first data group and the first key-value pairs have a first schema , partitioning the first data set into a plurality of data partitions and providing each data partition to a selected one of a plurality of mapping functions that are each user-configurable to independently output a plurality of lists of values for each of a set of keys found in such map function' ;

s corresponding data partition to form a first intermediate data set having a first set of resulting key-value pairs ;

for a second data (second data) set having a plurality of second key-value pairs , wherein such second data set belongs to a second data group and the second key-value pairs have a second schema , partitioning the first data set into a plurality of data partitions and providing each data partition to a selected one of a plurality of mapping functions to form a second intermediate data set having a second set of resulting key-value pairs differing from the first set of resulting key-value pairs , wherein the first schema differs from the second schema , wherein the first and second schema and the first and second set of resulting key-value pairs have a key in common ;

and reducing the first and the second intermediate data set together so as to form an output data set , wherein the reducing is accomplished by iterating on at least one key from the first and second intermediate data set , and the output data set has a different schema than the first and second schema , wherein the mapping and reducing operations are performed by a distributed system .
EP1227396A1
CLAIM 1
In an environment that includes a first device storing first data (first data) and a second device storing second data (second data) , a method of synchronizing the second data with the first data , while accounting for one or more update notifications that either may or may not have been received by the second device and while accounting for any differences in how the first device and second device store data , the method comprising : an act of making a change in the first data ;
an act of sending a notification to the second device , the notification including both the change and a token identifying the change ;
an act of receiving a synchronization request from the second device ;
and an act of resending the change to the second device if the synchronization request does not include the token .

US8190610B2
CLAIM 40
. A computer system including a plurality of computing devices , the computer system configured to process data of a data set , wherein the data set comprises a plurality of data groups , the computer system comprises at least one processor and memory that are operable to perform the following operations : for a first data (first data) set having a plurality of first key-value pairs , wherein such first data set belongs to a first data group and the first key-value pairs have a first schema , partitioning the first data set into a plurality of data partitions and providing each data partition to a selected one of a plurality of mapping functions that are each user-configurable to independently output a plurality of lists of values for each of a set of keys found in such map function' ;

s corresponding data partition to form a first intermediate data set having a first set of resulting key-value pairs ;

for a second data (second data) set having a plurality of second key-value pairs , wherein such second data set belongs to a second data group and the second key-value pairs have a second schema , partitioning the first data set into a plurality of data partitions and providing each data partition to a selected one of a plurality of mapping functions to form a second intermediate data set having a second set of resulting key-value pairs differing from the first set of resulting key-value pairs , wherein the first schema differs from the second schema ;

and reducing the first and the second intermediate data set together so as to form an output data set , wherein the reducing is accomplished by iterating on at least one key from the first and second intermediate data set , and the output data set has a different schema than the first and second schema , wherein the first and second schema and the first and second set of resulting key-value pairs have a key in common .
EP1227396A1
CLAIM 1
In an environment that includes a first device storing first data (first data) and a second device storing second data (second data) , a method of synchronizing the second data with the first data , while accounting for one or more update notifications that either may or may not have been received by the second device and while accounting for any differences in how the first device and second device store data , the method comprising : an act of making a change in the first data ;
an act of sending a notification to the second device , the notification including both the change and a token identifying the change ;
an act of receiving a synchronization request from the second device ;
and an act of resending the change to the second device if the synchronization request does not include the token .




US8190610B2

Filed: 2006-10-05     Issued: 2012-05-29

MapReduce for distributed database processing

(Original Assignee) Yahoo Inc     (Current Assignee) R2 Solutions LLC ; Altaba Inc

Ali Dasdan, Hung-Chih Yang, Ruey-Lung Hsiao
JP2003157249A

Filed: 2001-11-21     Issued: 2003-05-30

文書の圧縮格納方法

(Original Assignee) Degital Works Kk; ディジタル・ワークス株式会社     

Koji Ito, 宏二 伊藤
US8190610B2
CLAIM 1
. A method of processing data of a data set over a distributed system , wherein the data set comprises a plurality of data groups , the method comprising : partitioning the data of each one of the data groups into a plurality of data partitions that each have a plurality of key-value pairs and providing each data partition to a selected one of a plurality of mapping functions that are each user-configurable to independently output a plurality of lists of values for each of a set of keys found in such map function' ;

s corresponding data partition to form corresponding intermediate data for that data group and identifiable to that data group , wherein the data of a first data group has a different schema (メモリ) than the data of a second data group and the data of the first data group is mapped differently than the data of the second data group so that different lists of values are output for the corresponding different intermediate data , wherein the different schema and corresponding different intermediate data have a key in common ;

and reducing the intermediate data for the data groups to at least one output data group , including processing the intermediate data for each data group in a manner that is defined to correspond to that data group , so as to result in a merging of the corresponding different intermediate data based on the key in common , wherein the mapping and reducing operations (前記ノード) are performed by a distributed system .
JP2003157249A
CLAIM 1
【請求項1】 文書構造の実データノードから実データ を削除し要素識別子ノードとした整形式スキーマを生成 し、 前記整形式スキーマのそれぞれのノードに対応する文書 構造の各部分に、ノード識別子及び各部分独自の要素識 別子を与え、 該文書構造の前記部分の実データを、前記ノード (reducing operations) 識別子 及び前記独自の要素識別子に対応させてメモリ (different schema) に格納 し、 前記整形式スキーマにおいては、前記文書構造の前記部 分の各々についての情報を、実データを除いた形でノー ド識別子と独自の要素識別子によって表すデータ構造の 形で格納し、 要素識別子及びノード識別子の関連を規定する圧縮結果 インデックス(CRX)を生成してメモリに格納し、 前記整形式スキーマの要素識別子と前記圧縮結果インデ ックス(CRX)の対応する組の集合を圧縮結果セット(C RS)としてメモリに格納し、 文書構造の前記部分のうち、複数の文書に共通する部分 については、前記要素識別子及び前記ノード識別子の各 々に関し共通の識別子を付与する、ことを特徴とする文 書の圧縮格納方法。

US8190610B2
CLAIM 12
. The method of claim 5 , wherein : the intermediate data includes a plurality of grouped sets of key/value pairs ;

the reducing step is carried out by a plurality of reducers ;

for at least one of the reducers , the iterator corresponding to a particular data group , for that reducer , operates according to a different key of a different schema (メモリ) than the iterator corresponding to another particular data group , for that reducer .
JP2003157249A
CLAIM 1
【請求項1】 文書構造の実データノードから実データ を削除し要素識別子ノードとした整形式スキーマを生成 し、 前記整形式スキーマのそれぞれのノードに対応する文書 構造の各部分に、ノード識別子及び各部分独自の要素識 別子を与え、 該文書構造の前記部分の実データを、前記ノード識別子 及び前記独自の要素識別子に対応させてメモリ (different schema) に格納 し、 前記整形式スキーマにおいては、前記文書構造の前記部 分の各々についての情報を、実データを除いた形でノー ド識別子と独自の要素識別子によって表すデータ構造の 形で格納し、 要素識別子及びノード識別子の関連を規定する圧縮結果 インデックス(CRX)を生成してメモリに格納し、 前記整形式スキーマの要素識別子と前記圧縮結果インデ ックス(CRX)の対応する組の集合を圧縮結果セット(C RS)としてメモリに格納し、 文書構造の前記部分のうち、複数の文書に共通する部分 については、前記要素識別子及び前記ノード識別子の各 々に関し共通の識別子を付与する、ことを特徴とする文 書の圧縮格納方法。

US8190610B2
CLAIM 17
. A computer system (行うこと) including a plurality of computing devices , the computer system configured to process data of a data set , wherein the data set comprises a plurality of data groups , the computer system comprises at least one processor and memory that are operable to perform the following operations : partitioning the data of each one of the data groups into a plurality of data partitions that each have a plurality of key-value pairs and providing each data partition to a selected one of a plurality of mapping functions that are each user-configurable to independently output a plurality of lists of values for each of a set of keys found in such map function' ;

s corresponding data partition to form corresponding intermediate data for that data group and identifiable to that data group , wherein the data of a first data group has a different schema (メモリ) than the data of a second data group and the data of the first data group is mapped differently than the data of the second data group so that different lists of values are output for the corresponding different intermediate data , wherein the different schema and corresponding different intermediate data have a key in common ;

and reduce the intermediate data for the data groups to at least one output data group , including processing the intermediate data for each data group in a manner that is defined to correspond to that data group so as to result in a merging of the corresponding different intermediate data based on the key in common .
JP2003157249A
CLAIM 1
【請求項1】 文書構造の実データノードから実データ を削除し要素識別子ノードとした整形式スキーマを生成 し、 前記整形式スキーマのそれぞれのノードに対応する文書 構造の各部分に、ノード識別子及び各部分独自の要素識 別子を与え、 該文書構造の前記部分の実データを、前記ノード識別子 及び前記独自の要素識別子に対応させてメモリ (different schema) に格納 し、 前記整形式スキーマにおいては、前記文書構造の前記部 分の各々についての情報を、実データを除いた形でノー ド識別子と独自の要素識別子によって表すデータ構造の 形で格納し、 要素識別子及びノード識別子の関連を規定する圧縮結果 インデックス(CRX)を生成してメモリに格納し、 前記整形式スキーマの要素識別子と前記圧縮結果インデ ックス(CRX)の対応する組の集合を圧縮結果セット(C RS)としてメモリに格納し、 文書構造の前記部分のうち、複数の文書に共通する部分 については、前記要素識別子及び前記ノード識別子の各 々に関し共通の識別子を付与する、ことを特徴とする文 書の圧縮格納方法。

JP2003157249A
CLAIM 3
【請求項3】 請求項2に記載した方法であって、前記 単位文書ごとに文書識別子を付与し、前記文書識別子の リストを含む文書管理ファイルを設け、文書識別子に基 づいてデータの検索を行うこと (computer system) ができるようにしたこと を特徴とする文書の圧縮格納方法。

US8190610B2
CLAIM 18
. The computer system (行うこと) of claim 17 , wherein : the at least one output data group is a plurality of output data groups .
JP2003157249A
CLAIM 3
【請求項3】 請求項2に記載した方法であって、前記 単位文書ごとに文書識別子を付与し、前記文書識別子の リストを含む文書管理ファイルを設け、文書識別子に基 づいてデータの検索を行うこと (computer system) ができるようにしたこと を特徴とする文書の圧縮格納方法。

US8190610B2
CLAIM 19
. The computer system (行うこと) of claim 17 , wherein : corresponding intermediate data for a data group being identifiable to the data group includes the corresponding intermediate data being stored such that metadata associated with the manner of storage is identifiable to the data group .
JP2003157249A
CLAIM 3
【請求項3】 請求項2に記載した方法であって、前記 単位文書ごとに文書識別子を付与し、前記文書識別子の リストを含む文書管理ファイルを設け、文書識別子に基 づいてデータの検索を行うこと (computer system) ができるようにしたこと を特徴とする文書の圧縮格納方法。

US8190610B2
CLAIM 20
. The computer system (行うこと) of claim 19 , wherein : corresponding intermediate data for a data group being identifiable to the data group includes the computer system being configured to store the corresponding intermediate data in association with data that is identifiable to the data group .
JP2003157249A
CLAIM 3
【請求項3】 請求項2に記載した方法であって、前記 単位文書ごとに文書識別子を付与し、前記文書識別子の リストを含む文書管理ファイルを設け、文書識別子に基 づいてデータの検索を行うこと (computer system) ができるようにしたこと を特徴とする文書の圧縮格納方法。

US8190610B2
CLAIM 21
. The computer system (行うこと) of claim 17 , wherein : the at least one processor and memory being configured to process the intermediate data for each data group in a manner that is defined to correspond to that data group includes , for each data group , the at least one processor and memory being configured to employ an iterator that corresponds to that data group .
JP2003157249A
CLAIM 3
【請求項3】 請求項2に記載した方法であって、前記 単位文書ごとに文書識別子を付与し、前記文書識別子の リストを含む文書管理ファイルを設け、文書識別子に基 づいてデータの検索を行うこと (computer system) ができるようにしたこと を特徴とする文書の圧縮格納方法。

US8190610B2
CLAIM 22
. The computer system (行うこと) of claim 21 , wherein : the intermediate data includes a plurality of grouped sets of key/value pairs ;

the at least one processor and memory are further operable to reduce the intermediate data via a plurality of reducers ;

for at least one of the reducers , the iterator corresponding to a particular data group , for that reducer , is configured to operate according to a different key of a different schema (メモリ) than the iterator corresponding to another particular data group , for that reducer .
JP2003157249A
CLAIM 1
【請求項1】 文書構造の実データノードから実データ を削除し要素識別子ノードとした整形式スキーマを生成 し、 前記整形式スキーマのそれぞれのノードに対応する文書 構造の各部分に、ノード識別子及び各部分独自の要素識 別子を与え、 該文書構造の前記部分の実データを、前記ノード識別子 及び前記独自の要素識別子に対応させてメモリ (different schema) に格納 し、 前記整形式スキーマにおいては、前記文書構造の前記部 分の各々についての情報を、実データを除いた形でノー ド識別子と独自の要素識別子によって表すデータ構造の 形で格納し、 要素識別子及びノード識別子の関連を規定する圧縮結果 インデックス(CRX)を生成してメモリに格納し、 前記整形式スキーマの要素識別子と前記圧縮結果インデ ックス(CRX)の対応する組の集合を圧縮結果セット(C RS)としてメモリに格納し、 文書構造の前記部分のうち、複数の文書に共通する部分 については、前記要素識別子及び前記ノード識別子の各 々に関し共通の識別子を付与する、ことを特徴とする文 書の圧縮格納方法。

JP2003157249A
CLAIM 3
【請求項3】 請求項2に記載した方法であって、前記 単位文書ごとに文書識別子を付与し、前記文書識別子の リストを含む文書管理ファイルを設け、文書識別子に基 づいてデータの検索を行うこと (computer system) ができるようにしたこと を特徴とする文書の圧縮格納方法。

US8190610B2
CLAIM 23
. The computer system (行うこと) of claim 17 , wherein : the intermediate data includes a plurality of grouped sets of key/value pairs ;

the at least one processor and memory are further operable to reduce the intermediate data via a plurality of reducers ;

and the at least one processor and memory are further operable to : partition the intermediate data into a plurality of partitions , to provide each key/value pair of the intermediate data to a separate one of the partitions ;

and provide the intermediate data of each partition to a separate one of the reducers .
JP2003157249A
CLAIM 3
【請求項3】 請求項2に記載した方法であって、前記 単位文書ごとに文書識別子を付与し、前記文書識別子の リストを含む文書管理ファイルを設け、文書識別子に基 づいてデータの検索を行うこと (computer system) ができるようにしたこと を特徴とする文書の圧縮格納方法。

US8190610B2
CLAIM 24
. The computer system (行うこと) of claim 17 , wherein : the intermediate data includes a plurality of grouped sets of key/value pairs ;

the at least one processor and memory are further operable to reduce the intermediate data via a plurality of reducers ;

and the at least one processor and memory are further operable to : partition the intermediate data into a plurality of partitions , to provide at least some of the key/value pairs of the intermediate data to more than one of the partitions ;

and provide the intermediate data of each partition to a separate one of the reducers .
JP2003157249A
CLAIM 3
【請求項3】 請求項2に記載した方法であって、前記 単位文書ごとに文書識別子を付与し、前記文書識別子の リストを含む文書管理ファイルを設け、文書識別子に基 づいてデータの検索を行うこと (computer system) ができるようにしたこと を特徴とする文書の圧縮格納方法。

US8190610B2
CLAIM 25
. The computer system (行うこと) of claim 24 , wherein : the at least one processor and memory are further operable to , in partitioning , to provide all of the key/value pairs of the intermediate data to all of the partitions .
JP2003157249A
CLAIM 3
【請求項3】 請求項2に記載した方法であって、前記 単位文書ごとに文書識別子を付与し、前記文書識別子の リストを含む文書管理ファイルを設け、文書識別子に基 づいてデータの検索を行うこと (computer system) ができるようにしたこと を特徴とする文書の圧縮格納方法。

US8190610B2
CLAIM 26
. The computer system (行うこと) of claim 24 , wherein : at least some of the reducers include a sort , group-by-key and combine task ;

the at least one processor and memory are further operable to generate and provide metadata for at least some of the mapping , partitioning , combining , grouping and sorting .
JP2003157249A
CLAIM 3
【請求項3】 請求項2に記載した方法であって、前記 単位文書ごとに文書識別子を付与し、前記文書識別子の リストを含む文書管理ファイルを設け、文書識別子に基 づいてデータの検索を行うこと (computer system) ができるようにしたこと を特徴とする文書の圧縮格納方法。

US8190610B2
CLAIM 27
. The computer system (行うこと) of claim 26 , wherein : the reducing includes processing the metadata .
JP2003157249A
CLAIM 3
【請求項3】 請求項2に記載した方法であって、前記 単位文書ごとに文書識別子を付与し、前記文書識別子の リストを含む文書管理ファイルを設け、文書識別子に基 づいてデータの検索を行うこと (computer system) ができるようにしたこと を特徴とする文書の圧縮格納方法。

US8190610B2
CLAIM 28
. The computer system (行うこと) of claim 27 , wherein : the at least one processor and memory being further operable to process the intermediate data for each data group in a manner that is defined to correspond to that data group includes , for each data group , the at least one processor and memory being further operable to employ an iterator that corresponds to that data group , wherein the iterator is configured to provide the associated metadata to the reducing .
JP2003157249A
CLAIM 3
【請求項3】 請求項2に記載した方法であって、前記 単位文書ごとに文書識別子を付与し、前記文書識別子の リストを含む文書管理ファイルを設け、文書識別子に基 づいてデータの検索を行うこと (computer system) ができるようにしたこと を特徴とする文書の圧縮格納方法。

US8190610B2
CLAIM 29
. The computer system (行うこと) of claim 17 , wherein : the intermediate data processing of the reducing further comprises processing data that is not intermediate data .
JP2003157249A
CLAIM 3
【請求項3】 請求項2に記載した方法であって、前記 単位文書ごとに文書識別子を付与し、前記文書識別子の リストを含む文書管理ファイルを設け、文書識別子に基 づいてデータの検索を行うこと (computer system) ができるようにしたこと を特徴とする文書の圧縮格納方法。

US8190610B2
CLAIM 30
. The computer system (行うこと) of claim 29 , wherein : the at least one processor and memory are further operable to reduce the intermediate data via a plurality of reducers ;

and the data that is not intermediate data , for a particular reducer , includes data that is associated with another reducer .
JP2003157249A
CLAIM 3
【請求項3】 請求項2に記載した方法であって、前記 単位文書ごとに文書識別子を付与し、前記文書識別子の リストを含む文書管理ファイルを設け、文書識別子に基 づいてデータの検索を行うこと (computer system) ができるようにしたこと を特徴とする文書の圧縮格納方法。

US8190610B2
CLAIM 31
. The computer system (行うこと) of claim 29 , wherein : the at least one processor and memory are further operable to reduce the intermediate data via a plurality of reducers ;

and the data that is not intermediate data , for a particular reducer , includes data that is associated with that reducer .
JP2003157249A
CLAIM 3
【請求項3】 請求項2に記載した方法であって、前記 単位文書ごとに文書識別子を付与し、前記文書識別子の リストを含む文書管理ファイルを設け、文書識別子に基 づいてデータの検索を行うこと (computer system) ができるようにしたこと を特徴とする文書の圧縮格納方法。

US8190610B2
CLAIM 32
. The computer system (行うこと) of claim 17 , wherein the at least one processor and memory are further operable to reduce by relating the data among the plurality of data groups .
JP2003157249A
CLAIM 3
【請求項3】 請求項2に記載した方法であって、前記 単位文書ごとに文書識別子を付与し、前記文書識別子の リストを含む文書管理ファイルを設け、文書識別子に基 づいてデータの検索を行うこと (computer system) ができるようにしたこと を特徴とする文書の圧縮格納方法。

US8190610B2
CLAIM 33
. A map-reduce method of processing data from a plurality of groups having different schema (メモリ) over a computer system (行うこと) , the method comprising : for a first data set having a plurality of first key-value pairs , wherein such first data set belongs to a first data group and the first key-value pairs have a first schema , partitioning the first data set into a plurality of data partitions and providing each data partition to a selected one of a plurality of mapping functions that are each user-configurable to independently output a plurality of lists of values for each of a set of keys found in such map function' ;

s corresponding data partition to form a first intermediate data set having a first set of resulting key-value pairs ;

for a second data set having a plurality of second key-value pairs , wherein such second data set belongs to a second data group and the second key-value pairs have a second schema , partitioning the first data set into a plurality of data partitions and providing each data partition to a selected one of a plurality of mapping functions to form a second intermediate data (ファイル) set having a second set of resulting key-value pairs differing from the first set of resulting key-value pairs , wherein the first schema differs from the second schema , wherein the first and second schema and the first and second set of resulting key-value pairs have a key in common ;

and reducing the first and the second intermediate data set together so as to form an output data set , wherein the reducing is accomplished by iterating on at least one key from the first and second intermediate data set , and the output data set has a different schema than the first and second schema , wherein the mapping and reducing operations (前記ノード) are performed by a distributed system .
JP2003157249A
CLAIM 1
【請求項1】 文書構造の実データノードから実データ を削除し要素識別子ノードとした整形式スキーマを生成 し、 前記整形式スキーマのそれぞれのノードに対応する文書 構造の各部分に、ノード識別子及び各部分独自の要素識 別子を与え、 該文書構造の前記部分の実データを、前記ノード (reducing operations) 識別子 及び前記独自の要素識別子に対応させてメモリ (different schema) に格納 し、 前記整形式スキーマにおいては、前記文書構造の前記部 分の各々についての情報を、実データを除いた形でノー ド識別子と独自の要素識別子によって表すデータ構造の 形で格納し、 要素識別子及びノード識別子の関連を規定する圧縮結果 インデックス(CRX)を生成してメモリに格納し、 前記整形式スキーマの要素識別子と前記圧縮結果インデ ックス(CRX)の対応する組の集合を圧縮結果セット(C RS)としてメモリに格納し、 文書構造の前記部分のうち、複数の文書に共通する部分 については、前記要素識別子及び前記ノード識別子の各 々に関し共通の識別子を付与する、ことを特徴とする文 書の圧縮格納方法。

JP2003157249A
CLAIM 3
【請求項3】 請求項2に記載した方法であって、前記 単位文書ごとに文書識別子を付与し、前記文書識別子の リストを含む文書管理ファイル (second intermediate data) を設け、文書識別子に基 づいてデータの検索を行うこと (computer system) ができるようにしたこと を特徴とする文書の圧縮格納方法。

US8190610B2
CLAIM 34
. The map-reduce method of claim 33 , wherein the reducing is accomplished by iterating on at least one key from each of the first and second intermediate data (ファイル) set so that the output data set is a merging of a portion of the first and second intermediate data set .
JP2003157249A
CLAIM 3
【請求項3】 請求項2に記載した方法であって、前記 単位文書ごとに文書識別子を付与し、前記文書識別子の リストを含む文書管理ファイル (second intermediate data) を設け、文書識別子に基 づいてデータの検索を行うことができるようにしたこと を特徴とする文書の圧縮格納方法。

US8190610B2
CLAIM 35
. The map-reduce method of claim 33 , wherein the reducing is accomplished by a plurality of reducers , the method further comprises : partitioning the first or second intermediate data (ファイル) set into a plurality of partitions , at least some of the key-value pairs of the first or second intermediate data being provided to more than one of the partitions ;

and providing the partitioned intermediate data set of each partition to a separate one of the reducers .
JP2003157249A
CLAIM 3
【請求項3】 請求項2に記載した方法であって、前記 単位文書ごとに文書識別子を付与し、前記文書識別子の リストを含む文書管理ファイル (second intermediate data) を設け、文書識別子に基 づいてデータの検索を行うことができるようにしたこと を特徴とする文書の圧縮格納方法。

US8190610B2
CLAIM 36
. The map-reduce method of claim 35 , wherein all of the key-value pairs of the first and second intermediate data (ファイル) set are provided to all of the reducers .
JP2003157249A
CLAIM 3
【請求項3】 請求項2に記載した方法であって、前記 単位文書ごとに文書識別子を付与し、前記文書識別子の リストを含む文書管理ファイル (second intermediate data) を設け、文書識別子に基 づいてデータの検索を行うことができるようにしたこと を特徴とする文書の圧縮格納方法。

US8190610B2
CLAIM 40
. A computer system (行うこと) including a plurality of computing devices , the computer system configured to process data of a data set , wherein the data set comprises a plurality of data groups , the computer system comprises at least one processor and memory that are operable to perform the following operations : for a first data set having a plurality of first key-value pairs , wherein such first data set belongs to a first data group and the first key-value pairs have a first schema , partitioning the first data set into a plurality of data partitions and providing each data partition to a selected one of a plurality of mapping functions that are each user-configurable to independently output a plurality of lists of values for each of a set of keys found in such map function' ;

s corresponding data partition to form a first intermediate data set having a first set of resulting key-value pairs ;

for a second data set having a plurality of second key-value pairs , wherein such second data set belongs to a second data group and the second key-value pairs have a second schema , partitioning the first data set into a plurality of data partitions and providing each data partition to a selected one of a plurality of mapping functions to form a second intermediate data (ファイル) set having a second set of resulting key-value pairs differing from the first set of resulting key-value pairs , wherein the first schema differs from the second schema ;

and reducing the first and the second intermediate data set together so as to form an output data set , wherein the reducing is accomplished by iterating on at least one key from the first and second intermediate data set , and the output data set has a different schema (メモリ) than the first and second schema , wherein the first and second schema and the first and second set of resulting key-value pairs have a key in common .
JP2003157249A
CLAIM 1
【請求項1】 文書構造の実データノードから実データ を削除し要素識別子ノードとした整形式スキーマを生成 し、 前記整形式スキーマのそれぞれのノードに対応する文書 構造の各部分に、ノード識別子及び各部分独自の要素識 別子を与え、 該文書構造の前記部分の実データを、前記ノード識別子 及び前記独自の要素識別子に対応させてメモリ (different schema) に格納 し、 前記整形式スキーマにおいては、前記文書構造の前記部 分の各々についての情報を、実データを除いた形でノー ド識別子と独自の要素識別子によって表すデータ構造の 形で格納し、 要素識別子及びノード識別子の関連を規定する圧縮結果 インデックス(CRX)を生成してメモリに格納し、 前記整形式スキーマの要素識別子と前記圧縮結果インデ ックス(CRX)の対応する組の集合を圧縮結果セット(C RS)としてメモリに格納し、 文書構造の前記部分のうち、複数の文書に共通する部分 については、前記要素識別子及び前記ノード識別子の各 々に関し共通の識別子を付与する、ことを特徴とする文 書の圧縮格納方法。

JP2003157249A
CLAIM 3
【請求項3】 請求項2に記載した方法であって、前記 単位文書ごとに文書識別子を付与し、前記文書識別子の リストを含む文書管理ファイル (second intermediate data) を設け、文書識別子に基 づいてデータの検索を行うこと (computer system) ができるようにしたこと を特徴とする文書の圧縮格納方法。

US8190610B2
CLAIM 41
. The computer system (行うこと) of claim 40 , wherein the reducing is accomplished by iterating on at least one key from each of the first and second intermediate data (ファイル) set so that the output data set is a merging of a portion of the first and second intermediate data set .
JP2003157249A
CLAIM 3
【請求項3】 請求項2に記載した方法であって、前記 単位文書ごとに文書識別子を付与し、前記文書識別子の リストを含む文書管理ファイル (second intermediate data) を設け、文書識別子に基 づいてデータの検索を行うこと (computer system) ができるようにしたこと を特徴とする文書の圧縮格納方法。

US8190610B2
CLAIM 42
. The computer system (行うこと) of claim 40 , wherein the reducing is accomplished by a plurality of reducers , the d the at least one processor and memory further operable for : partitioning the first or second intermediate data (ファイル) set into a plurality of partitions , at least some of the key-value pairs of the first or second intermediate data being provided to more than one of the partitions ;

and providing the partitioned intermediate data set of each partition to a separate one of the reducers .
JP2003157249A
CLAIM 3
【請求項3】 請求項2に記載した方法であって、前記 単位文書ごとに文書識別子を付与し、前記文書識別子の リストを含む文書管理ファイル (second intermediate data) を設け、文書識別子に基 づいてデータの検索を行うこと (computer system) ができるようにしたこと を特徴とする文書の圧縮格納方法。

US8190610B2
CLAIM 43
. The computer system (行うこと) of claim 42 , wherein all of the key-value pairs of the first and second intermediate data (ファイル) set are provided to all of the reducers .
JP2003157249A
CLAIM 3
【請求項3】 請求項2に記載した方法であって、前記 単位文書ごとに文書識別子を付与し、前記文書識別子の リストを含む文書管理ファイル (second intermediate data) を設け、文書識別子に基 づいてデータの検索を行うこと (computer system) ができるようにしたこと を特徴とする文書の圧縮格納方法。

US8190610B2
CLAIM 44
. The computer system (行うこと) of claim 42 , wherein at least some of the reducers include a sort , group-by-key and combine task , the d the at least one processor and memory are further operable for : generating and providing metadata for at least some of the mapping , partitioning , combining , grouping and sorting .
JP2003157249A
CLAIM 3
【請求項3】 請求項2に記載した方法であって、前記 単位文書ごとに文書識別子を付与し、前記文書識別子の リストを含む文書管理ファイルを設け、文書識別子に基 づいてデータの検索を行うこと (computer system) ができるようにしたこと を特徴とする文書の圧縮格納方法。

US8190610B2
CLAIM 45
. The computer system (行うこと) of claim 44 , wherein the reducing includes processing the metadata .
JP2003157249A
CLAIM 3
【請求項3】 請求項2に記載した方法であって、前記 単位文書ごとに文書識別子を付与し、前記文書識別子の リストを含む文書管理ファイルを設け、文書識別子に基 づいてデータの検索を行うこと (computer system) ができるようにしたこと を特徴とする文書の圧縮格納方法。

US8190610B2
CLAIM 46
. The computer system (行うこと) of claim 45 , wherein iterating includes providing the associated metadata to the processing of the reducing step .
JP2003157249A
CLAIM 3
【請求項3】 請求項2に記載した方法であって、前記 単位文書ごとに文書識別子を付与し、前記文書識別子の リストを含む文書管理ファイルを設け、文書識別子に基 づいてデータの検索を行うこと (computer system) ができるようにしたこと を特徴とする文書の圧縮格納方法。




US8190610B2

Filed: 2006-10-05     Issued: 2012-05-29

MapReduce for distributed database processing

(Original Assignee) Yahoo Inc     (Current Assignee) R2 Solutions LLC ; Altaba Inc

Ali Dasdan, Hung-Chih Yang, Ruey-Lung Hsiao
US20020184401A1

Filed: 2001-10-22     Issued: 2002-12-05

Extensible information system

(Original Assignee) Polexis Inc     (Current Assignee) Polexis Inc

Richard Kadel, Jeffrey Herman, Christopher Exline, David Almilli, Christopher Priebe
US8190610B2
CLAIM 1
. A method of processing data of a data set over a distributed system , wherein the data set comprises a plurality of data groups , the method comprising : partitioning the data of each one of the data groups into a plurality of data partitions (desired form) that each have a plurality of key-value pairs and providing each data partition to a selected one of a plurality of mapping functions that are each user-configurable to independently output a plurality of lists of values for each of a set of keys found in such map function' ;

s corresponding data partition to form corresponding intermediate data for that data group and identifiable to that data group , wherein the data of a first data group has a different schema (XML schema) than the data of a second data group and the data of the first data group is mapped differently than the data of the second data group so that different lists of values are output for the corresponding different intermediate data , wherein the different schema and corresponding different intermediate data have a key in common ;

and reducing the intermediate data for the data groups to at least one output data group , including processing the intermediate data for each data group in a manner that is defined to correspond to that data group , so as to result in a merging of the corresponding different intermediate data based on the key in common , wherein the mapping and reducing operations (readable instructions) are performed by a distributed system .
US20020184401A1
CLAIM 9
. A computer system as defined in claim 8 , further including a TypeIO class of objects that define a desired form (data partitions) at of a data object attribute specified by a TypeMetaData class .

US20020184401A1
CLAIM 35
. A computer system as defined in claim 6 , wherein the domain is defined using XML schema (different schema) .

US20020184401A1
CLAIM 75
. A program product for use in a computer system that executes program instructions recorded in a computer-readable media to perform a method for information exchange in a computer system that supports an object oriented programming environment and includes access to data storage containing data objects , the program product comprising : a recordable media ;
and a program product of computer-readable instructions (reducing operations) executable by the computer system to perform a method comprising : receiving data specifications for a set of object classes that can be extended using object oriented principles to define an information handling application , wherein the extended objects provide an information handling application that can receive one or more data objects comprising a class of data source objects , and represent the data source objects in accordance with an Information Model class of objects of a mediation layer that defines a data interface between the data source objects and a class of data consumer objects ;
wherein the Information Model objects include methods that permit data communications or information exchange between the two data object classes , such that the class configuration of the data source objects can be specified independently of the class configuration of the data consumer objects .

US8190610B2
CLAIM 12
. The method of claim 5 , wherein : the intermediate data includes a plurality of grouped sets of key/value pairs ;

the reducing step is carried out by a plurality of reducers ;

for at least one of the reducers , the iterator corresponding to a particular data group , for that reducer , operates according to a different key of a different schema (XML schema) than the iterator corresponding to another particular data group , for that reducer .
US20020184401A1
CLAIM 35
. A computer system as defined in claim 6 , wherein the domain is defined using XML schema (different schema) .

US8190610B2
CLAIM 14
. The method of claim 13 , wherein : the reducing step is carried out by a plurality of reducers ;

and the data that is not intermediate data , for a particular reducer , includes data (includes data) that is associated with another reducer .
US20020184401A1
CLAIM 36
. A method of communicating data in a computer system that supports an object oriented programming environment and includes data (includes data) storage or access to data storage containing data objects and specifications for a set of object classes that can be extended using object oriented principles to define an information handling application , the method comprising : receiving one or more data objects comprising a class of data source objects ;
representing the data source objects in accordance with an Information Model class of objects of a mediation layer that defines a data interface between the data source objects and a class of data consumer objects ;
wherein the Information Model objects include methods that permit data communications or information exchange between the two data object classes , such that the class configuration of the data source objects can be specified independently of the class configuration of the data consumer objects .

US8190610B2
CLAIM 15
. The method of claim 13 , wherein : the reducing step is carried out by a plurality of reducers ;

and the data that is not intermediate data , for a particular reducer , includes data (includes data) that is associated with that reducer .
US20020184401A1
CLAIM 36
. A method of communicating data in a computer system that supports an object oriented programming environment and includes data (includes data) storage or access to data storage containing data objects and specifications for a set of object classes that can be extended using object oriented principles to define an information handling application , the method comprising : receiving one or more data objects comprising a class of data source objects ;
representing the data source objects in accordance with an Information Model class of objects of a mediation layer that defines a data interface between the data source objects and a class of data consumer objects ;
wherein the Information Model objects include methods that permit data communications or information exchange between the two data object classes , such that the class configuration of the data source objects can be specified independently of the class configuration of the data consumer objects .

US8190610B2
CLAIM 17
. A computer system including a plurality of computing devices , the computer system configured to process data of a data set , wherein the data set comprises a plurality of data groups , the computer system comprises at least one processor and memory that are operable to perform the following operations : partitioning the data of each one of the data groups into a plurality of data partitions (desired form) that each have a plurality of key-value pairs and providing each data partition to a selected one of a plurality of mapping functions that are each user-configurable to independently output a plurality of lists of values for each of a set of keys found in such map function' ;

s corresponding data partition to form corresponding intermediate data for that data group and identifiable to that data group , wherein the data of a first data group has a different schema (XML schema) than the data of a second data group and the data of the first data group is mapped differently than the data of the second data group so that different lists of values are output for the corresponding different intermediate data , wherein the different schema and corresponding different intermediate data have a key in common ;

and reduce the intermediate data for the data groups to at least one output data group , including processing the intermediate data for each data group in a manner that is defined to correspond to that data group so as to result in a merging of the corresponding different intermediate data based on the key in common .
US20020184401A1
CLAIM 9
. A computer system as defined in claim 8 , further including a TypeIO class of objects that define a desired form (data partitions) at of a data object attribute specified by a TypeMetaData class .

US20020184401A1
CLAIM 35
. A computer system as defined in claim 6 , wherein the domain is defined using XML schema (different schema) .

US8190610B2
CLAIM 22
. The computer system of claim 21 , wherein : the intermediate data includes a plurality of grouped sets of key/value pairs ;

the at least one processor and memory are further operable to reduce the intermediate data via a plurality of reducers ;

for at least one of the reducers , the iterator corresponding to a particular data group , for that reducer , is configured to operate according to a different key of a different schema (XML schema) than the iterator corresponding to another particular data group , for that reducer .
US20020184401A1
CLAIM 35
. A computer system as defined in claim 6 , wherein the domain is defined using XML schema (different schema) .

US8190610B2
CLAIM 30
. The computer system of claim 29 , wherein : the at least one processor and memory are further operable to reduce the intermediate data via a plurality of reducers ;

and the data that is not intermediate data , for a particular reducer , includes data (includes data) that is associated with another reducer .
US20020184401A1
CLAIM 36
. A method of communicating data in a computer system that supports an object oriented programming environment and includes data (includes data) storage or access to data storage containing data objects and specifications for a set of object classes that can be extended using object oriented principles to define an information handling application , the method comprising : receiving one or more data objects comprising a class of data source objects ;
representing the data source objects in accordance with an Information Model class of objects of a mediation layer that defines a data interface between the data source objects and a class of data consumer objects ;
wherein the Information Model objects include methods that permit data communications or information exchange between the two data object classes , such that the class configuration of the data source objects can be specified independently of the class configuration of the data consumer objects .

US8190610B2
CLAIM 31
. The computer system of claim 29 , wherein : the at least one processor and memory are further operable to reduce the intermediate data via a plurality of reducers ;

and the data that is not intermediate data , for a particular reducer , includes data (includes data) that is associated with that reducer .
US20020184401A1
CLAIM 36
. A method of communicating data in a computer system that supports an object oriented programming environment and includes data (includes data) storage or access to data storage containing data objects and specifications for a set of object classes that can be extended using object oriented principles to define an information handling application , the method comprising : receiving one or more data objects comprising a class of data source objects ;
representing the data source objects in accordance with an Information Model class of objects of a mediation layer that defines a data interface between the data source objects and a class of data consumer objects ;
wherein the Information Model objects include methods that permit data communications or information exchange between the two data object classes , such that the class configuration of the data source objects can be specified independently of the class configuration of the data consumer objects .

US8190610B2
CLAIM 33
. A map-reduce method of processing data from a plurality of groups having different schema (XML schema) over a computer system , the method comprising : for a first data set having a plurality of first key-value pairs , wherein such first data set belongs to a first data group and the first key-value pairs have a first schema , partitioning the first data set into a plurality of data partitions (desired form) and providing each data partition to a selected one of a plurality of mapping functions that are each user-configurable to independently output a plurality of lists of values for each of a set of keys found in such map function' ;

s corresponding data partition to form a first intermediate data set having a first set of resulting key-value pairs ;

for a second data set (selected attribute) having a plurality of second key-value pairs , wherein such second data set belongs to a second data group and the second key-value pairs have a second schema , partitioning the first data set into a plurality of data partitions and providing each data partition to a selected one of a plurality of mapping functions to form a second intermediate data set having a second set (selected attribute) of resulting key-value pairs differing from the first set of resulting key-value pairs , wherein the first schema differs from the second schema , wherein the first and second schema and the first and second set of resulting key-value pairs have a key in common ;

and reducing the first and the second intermediate data set together so as to form an output data set , wherein the reducing is accomplished by iterating on at least one key from the first and second intermediate data set , and the output data set has a different schema than the first and second schema , wherein the mapping and reducing operations (readable instructions) are performed by a distributed system .
US20020184401A1
CLAIM 9
. A computer system as defined in claim 8 , further including a TypeIO class of objects that define a desired form (data partitions) at of a data object attribute specified by a TypeMetaData class .

US20020184401A1
CLAIM 13
. A computer system as defined in claim 4 , wherein the processing criteria relates to a selected attribute (second set, second data set) of the data object for display processing .

US20020184401A1
CLAIM 35
. A computer system as defined in claim 6 , wherein the domain is defined using XML schema (different schema) .

US20020184401A1
CLAIM 75
. A program product for use in a computer system that executes program instructions recorded in a computer-readable media to perform a method for information exchange in a computer system that supports an object oriented programming environment and includes access to data storage containing data objects , the program product comprising : a recordable media ;
and a program product of computer-readable instructions (reducing operations) executable by the computer system to perform a method comprising : receiving data specifications for a set of object classes that can be extended using object oriented principles to define an information handling application , wherein the extended objects provide an information handling application that can receive one or more data objects comprising a class of data source objects , and represent the data source objects in accordance with an Information Model class of objects of a mediation layer that defines a data interface between the data source objects and a class of data consumer objects ;
wherein the Information Model objects include methods that permit data communications or information exchange between the two data object classes , such that the class configuration of the data source objects can be specified independently of the class configuration of the data consumer objects .

US8190610B2
CLAIM 40
. A computer system including a plurality of computing devices , the computer system configured to process data of a data set , wherein the data set comprises a plurality of data groups , the computer system comprises at least one processor and memory that are operable to perform the following operations : for a first data set having a plurality of first key-value pairs , wherein such first data set belongs to a first data group and the first key-value pairs have a first schema , partitioning the first data set into a plurality of data partitions (desired form) and providing each data partition to a selected one of a plurality of mapping functions that are each user-configurable to independently output a plurality of lists of values for each of a set of keys found in such map function' ;

s corresponding data partition to form a first intermediate data set having a first set of resulting key-value pairs ;

for a second data set (selected attribute) having a plurality of second key-value pairs , wherein such second data set belongs to a second data group and the second key-value pairs have a second schema , partitioning the first data set into a plurality of data partitions and providing each data partition to a selected one of a plurality of mapping functions to form a second intermediate data set having a second set (selected attribute) of resulting key-value pairs differing from the first set of resulting key-value pairs , wherein the first schema differs from the second schema ;

and reducing the first and the second intermediate data set together so as to form an output data set , wherein the reducing is accomplished by iterating on at least one key from the first and second intermediate data set , and the output data set has a different schema (XML schema) than the first and second schema , wherein the first and second schema and the first and second set of resulting key-value pairs have a key in common .
US20020184401A1
CLAIM 9
. A computer system as defined in claim 8 , further including a TypeIO class of objects that define a desired form (data partitions) at of a data object attribute specified by a TypeMetaData class .

US20020184401A1
CLAIM 13
. A computer system as defined in claim 4 , wherein the processing criteria relates to a selected attribute (second set, second data set) of the data object for display processing .

US20020184401A1
CLAIM 35
. A computer system as defined in claim 6 , wherein the domain is defined using XML schema (different schema) .




US8190610B2

Filed: 2006-10-05     Issued: 2012-05-29

MapReduce for distributed database processing

(Original Assignee) Yahoo Inc     (Current Assignee) R2 Solutions LLC ; Altaba Inc

Ali Dasdan, Hung-Chih Yang, Ruey-Lung Hsiao
US20030074348A1

Filed: 2001-10-16     Issued: 2003-04-17

Partitioned database system

(Original Assignee) NCR Corp     (Current Assignee) Teradata US Inc

Paul Sinclair, Donald Pederson, Steven Cohen
US8190610B2
CLAIM 1
. A method of processing data of a data set over a distributed system , wherein the data set comprises a plurality of data groups , the method comprising : partitioning the data of each one of the data groups into a plurality of data partitions that each have a plurality of key-value pairs and providing each data partition to a selected one of a plurality of mapping functions that are each user-configurable to independently output a plurality of lists of values for each of a set of keys found in such map function' ;

s corresponding data partition to form corresponding intermediate data for that data group and identifiable to that data group , wherein the data of a first data group (second functions) has a different schema than the data of a second data group (second functions) and the data of the first data group is mapped differently than the data of the second data group so that different lists of values are output for the corresponding different intermediate data , wherein the different schema and corresponding different intermediate data have a key in common ;

and reducing the intermediate data for the data groups to at least one output data group , including processing the intermediate data for each data group in a manner that is defined to correspond to that data group , so as to result in a merging of the corresponding different intermediate data based on the key in common , wherein the mapping and reducing operations are performed by a distributed system .
US20030074348A1
CLAIM 16
. The method of claim 15 , further comprising the step of : if more than one row of the table has identical results of the first and second functions (first set, second set, first data group, second data group) , storing those rows in a logical order corresponding to a third value for each of those rows .

US8190610B2
CLAIM 17
. A computer system including a plurality of computing devices (first portion) , the computer system configured to process data of a data set , wherein the data set comprises a plurality of data groups , the computer system comprises at least one processor and memory that are operable to perform the following operations : partitioning the data of each one of the data groups into a plurality of data partitions that each have a plurality of key-value pairs and providing each data partition to a selected one of a plurality of mapping functions that are each user-configurable to independently output a plurality of lists of values for each of a set of keys found in such map function' ;

s corresponding data partition to form corresponding intermediate data for that data group and identifiable to that data group , wherein the data of a first data group (second functions) has a different schema than the data of a second data group (second functions) and the data of the first data group is mapped differently than the data of the second data group so that different lists of values are output for the corresponding different intermediate data , wherein the different schema and corresponding different intermediate data have a key in common ;

and reduce the intermediate data for the data groups to at least one output data group , including processing the intermediate data for each data group in a manner that is defined to correspond to that data group so as to result in a merging of the corresponding different intermediate data based on the key in common .
US20030074348A1
CLAIM 16
. The method of claim 15 , further comprising the step of : if more than one row of the table has identical results of the first and second functions (first set, second set, first data group, second data group) , storing those rows in a logical order corresponding to a third value for each of those rows .

US20030074348A1
CLAIM 31
. A method for storing a row identification (row ID) in a data row structure comprising the steps of : setting a state of at least one bit in a header of a data row based on whether the row is part of a partitioned or unpartitioned table ;
including in the header a first portion (computing devices) of the row ID ;
if the state of the at least one bit indicates a partitioned table , including a second portion of the row ID in a body of the data row ;
and if the state of the at least one bit indicates a nonpartitioned table , specifying a second portion of the row ID that is assumed for the data row .

US8190610B2
CLAIM 33
. A map-reduce method of processing data from a plurality of groups having different schema over a computer system , the method comprising : for a first data set having a plurality of first key-value pairs , wherein such first data set belongs to a first data group (second functions) and the first key-value pairs have a first schema , partitioning the first data set into a plurality of data partitions and providing each data partition to a selected one of a plurality of mapping functions that are each user-configurable to independently output a plurality of lists of values for each of a set of keys found in such map function' ;

s corresponding data partition to form a first intermediate data set having a first set (second functions) of resulting key-value pairs ;

for a second data set having a plurality of second key-value pairs , wherein such second data set belongs to a second data group (second functions) and the second key-value pairs have a second schema , partitioning the first data set into a plurality of data partitions and providing each data partition to a selected one of a plurality of mapping functions to form a second intermediate data set having a second set (second functions) of resulting key-value pairs differing from the first set of resulting key-value pairs , wherein the first schema differs from the second schema , wherein the first and second schema and the first and second set of resulting key-value pairs have a key in common ;

and reducing the first and the second intermediate data set together so as to form an output data set (n value) , wherein the reducing is accomplished by iterating on at least one key from the first and second intermediate data set , and the output data set has a different schema than the first and second schema , wherein the mapping and reducing operations are performed by a distributed system .
US20030074348A1
CLAIM 15
. A method for building a partitioned database system , comprising the steps of : defining columns in a table ;
selecting a first group of one or more columns ;
selecting a first function based on value (output data set) s in each column of the first group of columns ;
selecting a second group of one or more columns ;
selecting a second function based on values in each column of the second group of columns ;
creating rows of the table ;
storing rows of the table in a storage facility in a logical order corresponding to the result of the first function for each row ;
and if more than one row of the table has an identical result of the first function , storing those rows in a logical order corresponding to the result of the second function .

US20030074348A1
CLAIM 16
. The method of claim 15 , further comprising the step of : if more than one row of the table has identical results of the first and second functions (first set, second set, first data group, second data group) , storing those rows in a logical order corresponding to a third value for each of those rows .

US8190610B2
CLAIM 34
. The map-reduce method of claim 33 , wherein the reducing is accomplished by iterating on at least one key from each of the first and second intermediate data set so that the output data set (n value) is a merging of a portion of the first and second intermediate data set .
US20030074348A1
CLAIM 15
. A method for building a partitioned database system , comprising the steps of : defining columns in a table ;
selecting a first group of one or more columns ;
selecting a first function based on value (output data set) s in each column of the first group of columns ;
selecting a second group of one or more columns ;
selecting a second function based on values in each column of the second group of columns ;
creating rows of the table ;
storing rows of the table in a storage facility in a logical order corresponding to the result of the first function for each row ;
and if more than one row of the table has an identical result of the first function , storing those rows in a logical order corresponding to the result of the second function .

US8190610B2
CLAIM 40
. A computer system including a plurality of computing devices (first portion) , the computer system configured to process data of a data set , wherein the data set comprises a plurality of data groups , the computer system comprises at least one processor and memory that are operable to perform the following operations : for a first data set having a plurality of first key-value pairs , wherein such first data set belongs to a first data group (second functions) and the first key-value pairs have a first schema , partitioning the first data set into a plurality of data partitions and providing each data partition to a selected one of a plurality of mapping functions that are each user-configurable to independently output a plurality of lists of values for each of a set of keys found in such map function' ;

s corresponding data partition to form a first intermediate data set having a first set (second functions) of resulting key-value pairs ;

for a second data set having a plurality of second key-value pairs , wherein such second data set belongs to a second data group (second functions) and the second key-value pairs have a second schema , partitioning the first data set into a plurality of data partitions and providing each data partition to a selected one of a plurality of mapping functions to form a second intermediate data set having a second set (second functions) of resulting key-value pairs differing from the first set of resulting key-value pairs , wherein the first schema differs from the second schema ;

and reducing the first and the second intermediate data set together so as to form an output data set (n value) , wherein the reducing is accomplished by iterating on at least one key from the first and second intermediate data set , and the output data set has a different schema than the first and second schema , wherein the first and second schema and the first and second set of resulting key-value pairs have a key in common .
US20030074348A1
CLAIM 15
. A method for building a partitioned database system , comprising the steps of : defining columns in a table ;
selecting a first group of one or more columns ;
selecting a first function based on value (output data set) s in each column of the first group of columns ;
selecting a second group of one or more columns ;
selecting a second function based on values in each column of the second group of columns ;
creating rows of the table ;
storing rows of the table in a storage facility in a logical order corresponding to the result of the first function for each row ;
and if more than one row of the table has an identical result of the first function , storing those rows in a logical order corresponding to the result of the second function .

US20030074348A1
CLAIM 16
. The method of claim 15 , further comprising the step of : if more than one row of the table has identical results of the first and second functions (first set, second set, first data group, second data group) , storing those rows in a logical order corresponding to a third value for each of those rows .

US20030074348A1
CLAIM 31
. A method for storing a row identification (row ID) in a data row structure comprising the steps of : setting a state of at least one bit in a header of a data row based on whether the row is part of a partitioned or unpartitioned table ;
including in the header a first portion (computing devices) of the row ID ;
if the state of the at least one bit indicates a partitioned table , including a second portion of the row ID in a body of the data row ;
and if the state of the at least one bit indicates a nonpartitioned table , specifying a second portion of the row ID that is assumed for the data row .

US8190610B2
CLAIM 41
. The computer system of claim 40 , wherein the reducing is accomplished by iterating on at least one key from each of the first and second intermediate data set so that the output data set (n value) is a merging of a portion of the first and second intermediate data set .
US20030074348A1
CLAIM 15
. A method for building a partitioned database system , comprising the steps of : defining columns in a table ;
selecting a first group of one or more columns ;
selecting a first function based on value (output data set) s in each column of the first group of columns ;
selecting a second group of one or more columns ;
selecting a second function based on values in each column of the second group of columns ;
creating rows of the table ;
storing rows of the table in a storage facility in a logical order corresponding to the result of the first function for each row ;
and if more than one row of the table has an identical result of the first function , storing those rows in a logical order corresponding to the result of the second function .




US8190610B2

Filed: 2006-10-05     Issued: 2012-05-29

MapReduce for distributed database processing

(Original Assignee) Yahoo Inc     (Current Assignee) R2 Solutions LLC ; Altaba Inc

Ali Dasdan, Hung-Chih Yang, Ruey-Lung Hsiao
US6553371B2

Filed: 2001-09-20     Issued: 2003-04-22

Method and system for specifying and displaying table joins in relational database queries

(Original Assignee) International Business Machines Corp     (Current Assignee) International Business Machines Corp

Humberto Gutierrez-Rivas, Fernando Cardoso Ismerio, Brian Gerrit Payton
US8190610B2
CLAIM 14
. The method of claim 13 , wherein : the reducing step is carried out by a plurality of reducers ;

and the data that is not intermediate data , for a particular reducer (down list) , includes data that is associated with another reducer .
US6553371B2
CLAIM 4
. The method according to claim 3 , wherein the join criteria including a join operator and a join type , and being selectable from a drop-down list (particular reducer) box .

US8190610B2
CLAIM 15
. The method of claim 13 , wherein : the reducing step is carried out by a plurality of reducers ;

and the data that is not intermediate data , for a particular reducer (down list) , includes data that is associated with that reducer .
US6553371B2
CLAIM 4
. The method according to claim 3 , wherein the join criteria including a join operator and a join type , and being selectable from a drop-down list (particular reducer) box .

US8190610B2
CLAIM 17
. A computer system including a plurality of computing devices (display device) , the computer system configured to process data of a data set , wherein the data set comprises a plurality of data groups , the computer system comprises at least one processor and memory that are operable to perform the following operations : partitioning the data of each one of the data groups into a plurality of data partitions that each have a plurality of key-value pairs and providing each data partition to a selected one of a plurality of mapping functions that are each user-configurable to independently output a plurality of lists of values for each of a set of keys found in such map function' ;

s corresponding data partition to form corresponding intermediate data for that data group and identifiable to that data group , wherein the data of a first data group has a different schema than the data of a second data group and the data of the first data group is mapped differently than the data of the second data group so that different lists of values are output for the corresponding different intermediate data , wherein the different schema and corresponding different intermediate data have a key in common ;

and reduce the intermediate data for the data groups to at least one output data group , including processing the intermediate data for each data group in a manner that is defined to correspond to that data group so as to result in a merging of the corresponding different intermediate data based on the key in common .
US6553371B2
CLAIM 1
. A software method useable in a computer database system with a display device (computing devices) for inferring and displaying a selection of valid table joins for a relational database query specifying a plurality of user-selected database tables , the method comprising the following steps : (a) assisting in a user' ;
s selection of a table join by displaying all user-selected database tables in a join overview diagram and automatically inferring and displaying in a join grid each join grid row having a potential valid table join for a pair of user-selected database tables ;
(b) accepting the user' ;
s selection of a join grid row from the join grid ;
and (c) responsive to the user' ;
s selection of a join grid row from the join grid , placing a join indicator in the join overview diagram between icons representing the pair of user-selected database tables from the user-selected join grid row .

US8190610B2
CLAIM 30
. The computer system of claim 29 , wherein : the at least one processor and memory are further operable to reduce the intermediate data via a plurality of reducers ;

and the data that is not intermediate data , for a particular reducer (down list) , includes data that is associated with another reducer .
US6553371B2
CLAIM 4
. The method according to claim 3 , wherein the join criteria including a join operator and a join type , and being selectable from a drop-down list (particular reducer) box .

US8190610B2
CLAIM 31
. The computer system of claim 29 , wherein : the at least one processor and memory are further operable to reduce the intermediate data via a plurality of reducers ;

and the data that is not intermediate data , for a particular reducer (down list) , includes data that is associated with that reducer .
US6553371B2
CLAIM 4
. The method according to claim 3 , wherein the join criteria including a join operator and a join type , and being selectable from a drop-down list (particular reducer) box .

US8190610B2
CLAIM 33
. A map-reduce method of processing data from a plurality of groups having different schema over a computer system , the method comprising : for a first data set having a plurality of first key-value pairs , wherein such first data set belongs to a first data group and the first key-value pairs have a first schema , partitioning the first data set into a plurality of data partitions and providing each data partition to a selected one of a plurality of mapping functions that are each user-configurable to independently output a plurality of lists of values for each of a set of keys found in such map function' ;

s corresponding data partition to form a first intermediate data set having a first set of resulting key-value pairs ;

for a second data set having a plurality of second key-value pairs , wherein such second data set belongs to a second data group and the second key-value pairs have a second schema , partitioning the first data set into a plurality of data partitions and providing each data partition to a selected one of a plurality of mapping functions to form a second intermediate data set having a second set (following steps) of resulting key-value pairs differing from the first set of resulting key-value pairs , wherein the first schema differs from the second schema , wherein the first and second schema and the first and second set of resulting key-value pairs have a key in common ;

and reducing the first and the second intermediate data set together so as to form an output data set , wherein the reducing is accomplished by iterating on at least one key from the first and second intermediate data set , and the output data set has a different schema than the first and second schema , wherein the mapping and reducing operations are performed by a distributed system .
US6553371B2
CLAIM 1
. A software method useable in a computer database system with a display device for inferring and displaying a selection of valid table joins for a relational database query specifying a plurality of user-selected database tables , the method comprising the following steps (second set, reduce method) : (a) assisting in a user' ;
s selection of a table join by displaying all user-selected database tables in a join overview diagram and automatically inferring and displaying in a join grid each join grid row having a potential valid table join for a pair of user-selected database tables ;
(b) accepting the user' ;
s selection of a join grid row from the join grid ;
and (c) responsive to the user' ;
s selection of a join grid row from the join grid , placing a join indicator in the join overview diagram between icons representing the pair of user-selected database tables from the user-selected join grid row .

US8190610B2
CLAIM 40
. A computer system including a plurality of computing devices (display device) , the computer system configured to process data of a data set , wherein the data set comprises a plurality of data groups , the computer system comprises at least one processor and memory that are operable to perform the following operations : for a first data set having a plurality of first key-value pairs , wherein such first data set belongs to a first data group and the first key-value pairs have a first schema , partitioning the first data set into a plurality of data partitions and providing each data partition to a selected one of a plurality of mapping functions that are each user-configurable to independently output a plurality of lists of values for each of a set of keys found in such map function' ;

s corresponding data partition to form a first intermediate data set having a first set of resulting key-value pairs ;

for a second data set having a plurality of second key-value pairs , wherein such second data set belongs to a second data group and the second key-value pairs have a second schema , partitioning the first data set into a plurality of data partitions and providing each data partition to a selected one of a plurality of mapping functions to form a second intermediate data set having a second set (following steps) of resulting key-value pairs differing from the first set of resulting key-value pairs , wherein the first schema differs from the second schema ;

and reducing the first and the second intermediate data set together so as to form an output data set , wherein the reducing is accomplished by iterating on at least one key from the first and second intermediate data set , and the output data set has a different schema than the first and second schema , wherein the first and second schema and the first and second set of resulting key-value pairs have a key in common .
US6553371B2
CLAIM 1
. A software method useable in a computer database system with a display device (computing devices) for inferring and displaying a selection of valid table joins for a relational database query specifying a plurality of user-selected database tables , the method comprising the following steps (second set, reduce method) : (a) assisting in a user' ;
s selection of a table join by displaying all user-selected database tables in a join overview diagram and automatically inferring and displaying in a join grid each join grid row having a potential valid table join for a pair of user-selected database tables ;
(b) accepting the user' ;
s selection of a join grid row from the join grid ;
and (c) responsive to the user' ;
s selection of a join grid row from the join grid , placing a join indicator in the join overview diagram between icons representing the pair of user-selected database tables from the user-selected join grid row .




US8190610B2

Filed: 2006-10-05     Issued: 2012-05-29

MapReduce for distributed database processing

(Original Assignee) Yahoo Inc     (Current Assignee) R2 Solutions LLC ; Altaba Inc

Ali Dasdan, Hung-Chih Yang, Ruey-Lung Hsiao
US20030078958A1

Filed: 2001-09-04     Issued: 2003-04-24

Method and system for deploying an asset over a multi-tiered network

(Original Assignee) Op40 Inc     (Current Assignee) Op40 Inc

Charles Pace, Darin Deforest, Paolo Pizzorni, Shuang Chen
US8190610B2
CLAIM 1
. A method of processing data of a data set over a distributed system , wherein the data set comprises a plurality of data groups , the method comprising : partitioning the data of each one of the data groups into a plurality of data partitions that each have a plurality of key-value pairs and providing each data partition to a selected one of a plurality of mapping functions that are each user-configurable to independently output a plurality of lists of values for each of a set of keys found in such map function' ;

s corresponding data partition to form corresponding intermediate data for that data group and identifiable to that data group , wherein the data of a first data (first data) group has a different schema than the data of a second data (second data structure) group and the data of the first data group is mapped differently than the data of the second data group so that different lists of values are output for the corresponding different intermediate data , wherein the different schema and corresponding different intermediate data have a key in common ;

and reducing the intermediate data for the data groups to at least one output data group , including processing the intermediate data for each data group in a manner that is defined to correspond to that data group , so as to result in a merging of the corresponding different intermediate data based on the key in common , wherein the mapping and reducing operations are performed by a distributed system .
US20030078958A1
CLAIM 18
. A method for targeting a digital asset to a multi-tiered network node , comprising : selecting a target asset adapter associated with the digital asset ;
determining an asset type associated with the digital asset ;
retrieving a descriptor from a first data (first data) structure associated with the digital asset ;
transforming the descriptor using a token replacement operation having a token associated with the node ;
running a query , using the transformed descriptor , on a table specified in the first data structure associated with the digital asset ;
creating a second data structure (second data) using data returned by the query ;
and inserting the second data structure into the first data structure associated with the digital asset .

US8190610B2
CLAIM 17
. A computer system (computer system) including a plurality of computing devices , the computer system configured to process data of a data set , wherein the data set comprises a plurality of data groups , the computer system comprises at least one processor and memory that are operable to perform the following operations : partitioning the data of each one of the data groups into a plurality of data partitions that each have a plurality of key-value pairs and providing each data partition to a selected one of a plurality of mapping functions that are each user-configurable to independently output a plurality of lists of values for each of a set of keys found in such map function' ;

s corresponding data partition to form corresponding intermediate data for that data group and identifiable to that data group , wherein the data of a first data (first data) group has a different schema than the data of a second data (second data structure) group and the data of the first data group is mapped differently than the data of the second data group so that different lists of values are output for the corresponding different intermediate data , wherein the different schema and corresponding different intermediate data have a key in common ;

and reduce the intermediate data for the data groups to at least one output data group , including processing the intermediate data for each data group in a manner that is defined to correspond to that data group so as to result in a merging of the corresponding different intermediate data based on the key in common .
US20030078958A1
CLAIM 1
. A method of operating a computer system (computer system) for targeting one or more digital assets on a distribution server connected to one or more networks so that the digital assets are compatible with one or more target nodes connected to the networks , the method comprising : examining the one or more digital assets to determine an asset type of the digital asset ;
if the asset type is Relational Data (RD) , retrieving one or more where clauses of the digital asset ;
executing a token replacement operation on the where clause to create a transformed where clause ;
running a query on one or more tables specified in the digital asset using the transformed where clause , the query returning one or more returned records , the returned records correlating with the target node ;
and storing the returned record in the digital asset .

US20030078958A1
CLAIM 18
. A method for targeting a digital asset to a multi-tiered network node , comprising : selecting a target asset adapter associated with the digital asset ;
determining an asset type associated with the digital asset ;
retrieving a descriptor from a first data (first data) structure associated with the digital asset ;
transforming the descriptor using a token replacement operation having a token associated with the node ;
running a query , using the transformed descriptor , on a table specified in the first data structure associated with the digital asset ;
creating a second data structure (second data) using data returned by the query ;
and inserting the second data structure into the first data structure associated with the digital asset .

US8190610B2
CLAIM 18
. The computer system (computer system) of claim 17 , wherein : the at least one output data group is a plurality of output data groups .
US20030078958A1
CLAIM 1
. A method of operating a computer system (computer system) for targeting one or more digital assets on a distribution server connected to one or more networks so that the digital assets are compatible with one or more target nodes connected to the networks , the method comprising : examining the one or more digital assets to determine an asset type of the digital asset ;
if the asset type is Relational Data (RD) , retrieving one or more where clauses of the digital asset ;
executing a token replacement operation on the where clause to create a transformed where clause ;
running a query on one or more tables specified in the digital asset using the transformed where clause , the query returning one or more returned records , the returned records correlating with the target node ;
and storing the returned record in the digital asset .

US8190610B2
CLAIM 19
. The computer system (computer system) of claim 17 , wherein : corresponding intermediate data for a data group being identifiable to the data group includes the corresponding intermediate data being stored such that metadata associated with the manner of storage is identifiable to the data group .
US20030078958A1
CLAIM 1
. A method of operating a computer system (computer system) for targeting one or more digital assets on a distribution server connected to one or more networks so that the digital assets are compatible with one or more target nodes connected to the networks , the method comprising : examining the one or more digital assets to determine an asset type of the digital asset ;
if the asset type is Relational Data (RD) , retrieving one or more where clauses of the digital asset ;
executing a token replacement operation on the where clause to create a transformed where clause ;
running a query on one or more tables specified in the digital asset using the transformed where clause , the query returning one or more returned records , the returned records correlating with the target node ;
and storing the returned record in the digital asset .

US8190610B2
CLAIM 20
. The computer system (computer system) of claim 19 , wherein : corresponding intermediate data for a data group being identifiable to the data group includes the computer system being configured to store the corresponding intermediate data in association with data that is identifiable to the data group .
US20030078958A1
CLAIM 1
. A method of operating a computer system (computer system) for targeting one or more digital assets on a distribution server connected to one or more networks so that the digital assets are compatible with one or more target nodes connected to the networks , the method comprising : examining the one or more digital assets to determine an asset type of the digital asset ;
if the asset type is Relational Data (RD) , retrieving one or more where clauses of the digital asset ;
executing a token replacement operation on the where clause to create a transformed where clause ;
running a query on one or more tables specified in the digital asset using the transformed where clause , the query returning one or more returned records , the returned records correlating with the target node ;
and storing the returned record in the digital asset .

US8190610B2
CLAIM 21
. The computer system (computer system) of claim 17 , wherein : the at least one processor and memory being configured to process the intermediate data for each data group in a manner that is defined to correspond to that data group includes , for each data group , the at least one processor and memory being configured to employ an iterator that corresponds to that data group .
US20030078958A1
CLAIM 1
. A method of operating a computer system (computer system) for targeting one or more digital assets on a distribution server connected to one or more networks so that the digital assets are compatible with one or more target nodes connected to the networks , the method comprising : examining the one or more digital assets to determine an asset type of the digital asset ;
if the asset type is Relational Data (RD) , retrieving one or more where clauses of the digital asset ;
executing a token replacement operation on the where clause to create a transformed where clause ;
running a query on one or more tables specified in the digital asset using the transformed where clause , the query returning one or more returned records , the returned records correlating with the target node ;
and storing the returned record in the digital asset .

US8190610B2
CLAIM 22
. The computer system (computer system) of claim 21 , wherein : the intermediate data includes a plurality of grouped sets of key/value pairs ;

the at least one processor and memory are further operable to reduce the intermediate data via a plurality of reducers ;

for at least one of the reducers , the iterator corresponding to a particular data group , for that reducer , is configured to operate according to a different key of a different schema than the iterator corresponding to another particular data group , for that reducer .
US20030078958A1
CLAIM 1
. A method of operating a computer system (computer system) for targeting one or more digital assets on a distribution server connected to one or more networks so that the digital assets are compatible with one or more target nodes connected to the networks , the method comprising : examining the one or more digital assets to determine an asset type of the digital asset ;
if the asset type is Relational Data (RD) , retrieving one or more where clauses of the digital asset ;
executing a token replacement operation on the where clause to create a transformed where clause ;
running a query on one or more tables specified in the digital asset using the transformed where clause , the query returning one or more returned records , the returned records correlating with the target node ;
and storing the returned record in the digital asset .

US8190610B2
CLAIM 23
. The computer system (computer system) of claim 17 , wherein : the intermediate data includes a plurality of grouped sets of key/value pairs ;

the at least one processor and memory are further operable to reduce the intermediate data via a plurality of reducers ;

and the at least one processor and memory are further operable to : partition the intermediate data into a plurality of partitions , to provide each key/value pair of the intermediate data to a separate one of the partitions ;

and provide the intermediate data of each partition to a separate one of the reducers .
US20030078958A1
CLAIM 1
. A method of operating a computer system (computer system) for targeting one or more digital assets on a distribution server connected to one or more networks so that the digital assets are compatible with one or more target nodes connected to the networks , the method comprising : examining the one or more digital assets to determine an asset type of the digital asset ;
if the asset type is Relational Data (RD) , retrieving one or more where clauses of the digital asset ;
executing a token replacement operation on the where clause to create a transformed where clause ;
running a query on one or more tables specified in the digital asset using the transformed where clause , the query returning one or more returned records , the returned records correlating with the target node ;
and storing the returned record in the digital asset .

US8190610B2
CLAIM 24
. The computer system (computer system) of claim 17 , wherein : the intermediate data includes a plurality of grouped sets of key/value pairs ;

the at least one processor and memory are further operable to reduce the intermediate data via a plurality of reducers ;

and the at least one processor and memory are further operable to : partition the intermediate data into a plurality of partitions , to provide at least some of the key/value pairs of the intermediate data to more than one of the partitions ;

and provide the intermediate data of each partition to a separate one of the reducers .
US20030078958A1
CLAIM 1
. A method of operating a computer system (computer system) for targeting one or more digital assets on a distribution server connected to one or more networks so that the digital assets are compatible with one or more target nodes connected to the networks , the method comprising : examining the one or more digital assets to determine an asset type of the digital asset ;
if the asset type is Relational Data (RD) , retrieving one or more where clauses of the digital asset ;
executing a token replacement operation on the where clause to create a transformed where clause ;
running a query on one or more tables specified in the digital asset using the transformed where clause , the query returning one or more returned records , the returned records correlating with the target node ;
and storing the returned record in the digital asset .

US8190610B2
CLAIM 25
. The computer system (computer system) of claim 24 , wherein : the at least one processor and memory are further operable to , in partitioning , to provide all of the key/value pairs of the intermediate data to all of the partitions .
US20030078958A1
CLAIM 1
. A method of operating a computer system (computer system) for targeting one or more digital assets on a distribution server connected to one or more networks so that the digital assets are compatible with one or more target nodes connected to the networks , the method comprising : examining the one or more digital assets to determine an asset type of the digital asset ;
if the asset type is Relational Data (RD) , retrieving one or more where clauses of the digital asset ;
executing a token replacement operation on the where clause to create a transformed where clause ;
running a query on one or more tables specified in the digital asset using the transformed where clause , the query returning one or more returned records , the returned records correlating with the target node ;
and storing the returned record in the digital asset .

US8190610B2
CLAIM 26
. The computer system (computer system) of claim 24 , wherein : at least some of the reducers include a sort , group-by-key and combine task ;

the at least one processor and memory are further operable to generate and provide metadata for at least some of the mapping , partitioning , combining , grouping and sorting .
US20030078958A1
CLAIM 1
. A method of operating a computer system (computer system) for targeting one or more digital assets on a distribution server connected to one or more networks so that the digital assets are compatible with one or more target nodes connected to the networks , the method comprising : examining the one or more digital assets to determine an asset type of the digital asset ;
if the asset type is Relational Data (RD) , retrieving one or more where clauses of the digital asset ;
executing a token replacement operation on the where clause to create a transformed where clause ;
running a query on one or more tables specified in the digital asset using the transformed where clause , the query returning one or more returned records , the returned records correlating with the target node ;
and storing the returned record in the digital asset .

US8190610B2
CLAIM 27
. The computer system (computer system) of claim 26 , wherein : the reducing includes processing the metadata .
US20030078958A1
CLAIM 1
. A method of operating a computer system (computer system) for targeting one or more digital assets on a distribution server connected to one or more networks so that the digital assets are compatible with one or more target nodes connected to the networks , the method comprising : examining the one or more digital assets to determine an asset type of the digital asset ;
if the asset type is Relational Data (RD) , retrieving one or more where clauses of the digital asset ;
executing a token replacement operation on the where clause to create a transformed where clause ;
running a query on one or more tables specified in the digital asset using the transformed where clause , the query returning one or more returned records , the returned records correlating with the target node ;
and storing the returned record in the digital asset .

US8190610B2
CLAIM 28
. The computer system (computer system) of claim 27 , wherein : the at least one processor and memory being further operable to process the intermediate data for each data group in a manner that is defined to correspond to that data group includes , for each data group , the at least one processor and memory being further operable to employ an iterator that corresponds to that data group , wherein the iterator is configured to provide the associated metadata to the reducing .
US20030078958A1
CLAIM 1
. A method of operating a computer system (computer system) for targeting one or more digital assets on a distribution server connected to one or more networks so that the digital assets are compatible with one or more target nodes connected to the networks , the method comprising : examining the one or more digital assets to determine an asset type of the digital asset ;
if the asset type is Relational Data (RD) , retrieving one or more where clauses of the digital asset ;
executing a token replacement operation on the where clause to create a transformed where clause ;
running a query on one or more tables specified in the digital asset using the transformed where clause , the query returning one or more returned records , the returned records correlating with the target node ;
and storing the returned record in the digital asset .

US8190610B2
CLAIM 29
. The computer system (computer system) of claim 17 , wherein : the intermediate data processing of the reducing further comprises processing data that is not intermediate data .
US20030078958A1
CLAIM 1
. A method of operating a computer system (computer system) for targeting one or more digital assets on a distribution server connected to one or more networks so that the digital assets are compatible with one or more target nodes connected to the networks , the method comprising : examining the one or more digital assets to determine an asset type of the digital asset ;
if the asset type is Relational Data (RD) , retrieving one or more where clauses of the digital asset ;
executing a token replacement operation on the where clause to create a transformed where clause ;
running a query on one or more tables specified in the digital asset using the transformed where clause , the query returning one or more returned records , the returned records correlating with the target node ;
and storing the returned record in the digital asset .

US8190610B2
CLAIM 30
. The computer system (computer system) of claim 29 , wherein : the at least one processor and memory are further operable to reduce the intermediate data via a plurality of reducers ;

and the data that is not intermediate data , for a particular reducer , includes data that is associated with another reducer .
US20030078958A1
CLAIM 1
. A method of operating a computer system (computer system) for targeting one or more digital assets on a distribution server connected to one or more networks so that the digital assets are compatible with one or more target nodes connected to the networks , the method comprising : examining the one or more digital assets to determine an asset type of the digital asset ;
if the asset type is Relational Data (RD) , retrieving one or more where clauses of the digital asset ;
executing a token replacement operation on the where clause to create a transformed where clause ;
running a query on one or more tables specified in the digital asset using the transformed where clause , the query returning one or more returned records , the returned records correlating with the target node ;
and storing the returned record in the digital asset .

US8190610B2
CLAIM 31
. The computer system (computer system) of claim 29 , wherein : the at least one processor and memory are further operable to reduce the intermediate data via a plurality of reducers ;

and the data that is not intermediate data , for a particular reducer , includes data that is associated with that reducer .
US20030078958A1
CLAIM 1
. A method of operating a computer system (computer system) for targeting one or more digital assets on a distribution server connected to one or more networks so that the digital assets are compatible with one or more target nodes connected to the networks , the method comprising : examining the one or more digital assets to determine an asset type of the digital asset ;
if the asset type is Relational Data (RD) , retrieving one or more where clauses of the digital asset ;
executing a token replacement operation on the where clause to create a transformed where clause ;
running a query on one or more tables specified in the digital asset using the transformed where clause , the query returning one or more returned records , the returned records correlating with the target node ;
and storing the returned record in the digital asset .

US8190610B2
CLAIM 32
. The computer system (computer system) of claim 17 , wherein the at least one processor and memory are further operable to reduce by relating the data among the plurality of data groups .
US20030078958A1
CLAIM 1
. A method of operating a computer system (computer system) for targeting one or more digital assets on a distribution server connected to one or more networks so that the digital assets are compatible with one or more target nodes connected to the networks , the method comprising : examining the one or more digital assets to determine an asset type of the digital asset ;
if the asset type is Relational Data (RD) , retrieving one or more where clauses of the digital asset ;
executing a token replacement operation on the where clause to create a transformed where clause ;
running a query on one or more tables specified in the digital asset using the transformed where clause , the query returning one or more returned records , the returned records correlating with the target node ;
and storing the returned record in the digital asset .

US8190610B2
CLAIM 33
. A map-reduce method of processing data from a plurality of groups having different schema over a computer system (computer system) , the method comprising : for a first data (first data) set having a plurality of first key-value pairs , wherein such first data set belongs to a first data group and the first key-value pairs have a first schema , partitioning the first data set into a plurality of data partitions and providing each data partition to a selected one of a plurality of mapping functions that are each user-configurable to independently output a plurality of lists of values for each of a set of keys found in such map function' ;

s corresponding data partition to form a first intermediate data set having a first set of resulting key-value pairs ;

for a second data (second data structure) set having a plurality of second key-value pairs , wherein such second data set belongs to a second data group and the second key-value pairs have a second schema , partitioning the first data set into a plurality of data partitions and providing each data partition to a selected one of a plurality of mapping functions to form a second intermediate data set having a second set of resulting key-value pairs differing from the first set of resulting key-value pairs , wherein the first schema differs from the second schema , wherein the first and second schema and the first and second set of resulting key-value pairs have a key in common ;

and reducing the first and the second intermediate data set together so as to form an output data set , wherein the reducing is accomplished by iterating on at least one key from the first and second intermediate data set , and the output data set has a different schema than the first and second schema , wherein the mapping and reducing operations are performed by a distributed system .
US20030078958A1
CLAIM 1
. A method of operating a computer system (computer system) for targeting one or more digital assets on a distribution server connected to one or more networks so that the digital assets are compatible with one or more target nodes connected to the networks , the method comprising : examining the one or more digital assets to determine an asset type of the digital asset ;
if the asset type is Relational Data (RD) , retrieving one or more where clauses of the digital asset ;
executing a token replacement operation on the where clause to create a transformed where clause ;
running a query on one or more tables specified in the digital asset using the transformed where clause , the query returning one or more returned records , the returned records correlating with the target node ;
and storing the returned record in the digital asset .

US20030078958A1
CLAIM 18
. A method for targeting a digital asset to a multi-tiered network node , comprising : selecting a target asset adapter associated with the digital asset ;
determining an asset type associated with the digital asset ;
retrieving a descriptor from a first data (first data) structure associated with the digital asset ;
transforming the descriptor using a token replacement operation having a token associated with the node ;
running a query , using the transformed descriptor , on a table specified in the first data structure associated with the digital asset ;
creating a second data structure (second data) using data returned by the query ;
and inserting the second data structure into the first data structure associated with the digital asset .

US8190610B2
CLAIM 40
. A computer system (computer system) including a plurality of computing devices , the computer system configured to process data of a data set , wherein the data set comprises a plurality of data groups , the computer system comprises at least one processor and memory that are operable to perform the following operations : for a first data (first data) set having a plurality of first key-value pairs , wherein such first data set belongs to a first data group and the first key-value pairs have a first schema , partitioning the first data set into a plurality of data partitions and providing each data partition to a selected one of a plurality of mapping functions that are each user-configurable to independently output a plurality of lists of values for each of a set of keys found in such map function' ;

s corresponding data partition to form a first intermediate data set having a first set of resulting key-value pairs ;

for a second data (second data structure) set having a plurality of second key-value pairs , wherein such second data set belongs to a second data group and the second key-value pairs have a second schema , partitioning the first data set into a plurality of data partitions and providing each data partition to a selected one of a plurality of mapping functions to form a second intermediate data set having a second set of resulting key-value pairs differing from the first set of resulting key-value pairs , wherein the first schema differs from the second schema ;

and reducing the first and the second intermediate data set together so as to form an output data set , wherein the reducing is accomplished by iterating on at least one key from the first and second intermediate data set , and the output data set has a different schema than the first and second schema , wherein the first and second schema and the first and second set of resulting key-value pairs have a key in common .
US20030078958A1
CLAIM 1
. A method of operating a computer system (computer system) for targeting one or more digital assets on a distribution server connected to one or more networks so that the digital assets are compatible with one or more target nodes connected to the networks , the method comprising : examining the one or more digital assets to determine an asset type of the digital asset ;
if the asset type is Relational Data (RD) , retrieving one or more where clauses of the digital asset ;
executing a token replacement operation on the where clause to create a transformed where clause ;
running a query on one or more tables specified in the digital asset using the transformed where clause , the query returning one or more returned records , the returned records correlating with the target node ;
and storing the returned record in the digital asset .

US20030078958A1
CLAIM 18
. A method for targeting a digital asset to a multi-tiered network node , comprising : selecting a target asset adapter associated with the digital asset ;
determining an asset type associated with the digital asset ;
retrieving a descriptor from a first data (first data) structure associated with the digital asset ;
transforming the descriptor using a token replacement operation having a token associated with the node ;
running a query , using the transformed descriptor , on a table specified in the first data structure associated with the digital asset ;
creating a second data structure (second data) using data returned by the query ;
and inserting the second data structure into the first data structure associated with the digital asset .

US8190610B2
CLAIM 41
. The computer system (computer system) of claim 40 , wherein the reducing is accomplished by iterating on at least one key from each of the first and second intermediate data set so that the output data set is a merging of a portion of the first and second intermediate data set .
US20030078958A1
CLAIM 1
. A method of operating a computer system (computer system) for targeting one or more digital assets on a distribution server connected to one or more networks so that the digital assets are compatible with one or more target nodes connected to the networks , the method comprising : examining the one or more digital assets to determine an asset type of the digital asset ;
if the asset type is Relational Data (RD) , retrieving one or more where clauses of the digital asset ;
executing a token replacement operation on the where clause to create a transformed where clause ;
running a query on one or more tables specified in the digital asset using the transformed where clause , the query returning one or more returned records , the returned records correlating with the target node ;
and storing the returned record in the digital asset .

US8190610B2
CLAIM 42
. The computer system (computer system) of claim 40 , wherein the reducing is accomplished by a plurality of reducers , the d the at least one processor and memory further operable for : partitioning the first or second intermediate data set into a plurality of partitions , at least some of the key-value pairs of the first or second intermediate data being provided to more than one of the partitions ;

and providing the partitioned intermediate data set of each partition to a separate one of the reducers .
US20030078958A1
CLAIM 1
. A method of operating a computer system (computer system) for targeting one or more digital assets on a distribution server connected to one or more networks so that the digital assets are compatible with one or more target nodes connected to the networks , the method comprising : examining the one or more digital assets to determine an asset type of the digital asset ;
if the asset type is Relational Data (RD) , retrieving one or more where clauses of the digital asset ;
executing a token replacement operation on the where clause to create a transformed where clause ;
running a query on one or more tables specified in the digital asset using the transformed where clause , the query returning one or more returned records , the returned records correlating with the target node ;
and storing the returned record in the digital asset .

US8190610B2
CLAIM 43
. The computer system (computer system) of claim 42 , wherein all of the key-value pairs of the first and second intermediate data set are provided to all of the reducers .
US20030078958A1
CLAIM 1
. A method of operating a computer system (computer system) for targeting one or more digital assets on a distribution server connected to one or more networks so that the digital assets are compatible with one or more target nodes connected to the networks , the method comprising : examining the one or more digital assets to determine an asset type of the digital asset ;
if the asset type is Relational Data (RD) , retrieving one or more where clauses of the digital asset ;
executing a token replacement operation on the where clause to create a transformed where clause ;
running a query on one or more tables specified in the digital asset using the transformed where clause , the query returning one or more returned records , the returned records correlating with the target node ;
and storing the returned record in the digital asset .

US8190610B2
CLAIM 44
. The computer system (computer system) of claim 42 , wherein at least some of the reducers include a sort , group-by-key and combine task , the d the at least one processor and memory are further operable for : generating and providing metadata for at least some of the mapping , partitioning , combining , grouping and sorting .
US20030078958A1
CLAIM 1
. A method of operating a computer system (computer system) for targeting one or more digital assets on a distribution server connected to one or more networks so that the digital assets are compatible with one or more target nodes connected to the networks , the method comprising : examining the one or more digital assets to determine an asset type of the digital asset ;
if the asset type is Relational Data (RD) , retrieving one or more where clauses of the digital asset ;
executing a token replacement operation on the where clause to create a transformed where clause ;
running a query on one or more tables specified in the digital asset using the transformed where clause , the query returning one or more returned records , the returned records correlating with the target node ;
and storing the returned record in the digital asset .

US8190610B2
CLAIM 45
. The computer system (computer system) of claim 44 , wherein the reducing includes processing the metadata .
US20030078958A1
CLAIM 1
. A method of operating a computer system (computer system) for targeting one or more digital assets on a distribution server connected to one or more networks so that the digital assets are compatible with one or more target nodes connected to the networks , the method comprising : examining the one or more digital assets to determine an asset type of the digital asset ;
if the asset type is Relational Data (RD) , retrieving one or more where clauses of the digital asset ;
executing a token replacement operation on the where clause to create a transformed where clause ;
running a query on one or more tables specified in the digital asset using the transformed where clause , the query returning one or more returned records , the returned records correlating with the target node ;
and storing the returned record in the digital asset .

US8190610B2
CLAIM 46
. The computer system (computer system) of claim 45 , wherein iterating includes providing the associated metadata to the processing of the reducing step .
US20030078958A1
CLAIM 1
. A method of operating a computer system (computer system) for targeting one or more digital assets on a distribution server connected to one or more networks so that the digital assets are compatible with one or more target nodes connected to the networks , the method comprising : examining the one or more digital assets to determine an asset type of the digital asset ;
if the asset type is Relational Data (RD) , retrieving one or more where clauses of the digital asset ;
executing a token replacement operation on the where clause to create a transformed where clause ;
running a query on one or more tables specified in the digital asset using the transformed where clause , the query returning one or more returned records , the returned records correlating with the target node ;
and storing the returned record in the digital asset .




US8190610B2

Filed: 2006-10-05     Issued: 2012-05-29

MapReduce for distributed database processing

(Original Assignee) Yahoo Inc     (Current Assignee) R2 Solutions LLC ; Altaba Inc

Ali Dasdan, Hung-Chih Yang, Ruey-Lung Hsiao
JP2003006021A

Filed: 2001-06-27     Issued: 2003-01-10

データベースシステムとデータベース管理方法およびプログラム

(Original Assignee) Hitachi Ltd; 株式会社日立製作所     

Nobuo Kawamura, Yukio Nakano, 幸生 中野, 信男 河村
US8190610B2
CLAIM 1
. A method of processing data (コンピュ) of a data set over a distributed system , wherein the data set comprises a plurality of data groups , the method comprising : partitioning the data of each one of the data groups into a plurality of data partitions that each have a plurality of key-value pairs and providing each data partition to a selected one of a plurality of mapping functions that are each user-configurable to independently output a plurality of lists of values for each of a set of keys found in such map function' ;

s corresponding data partition to form corresponding intermediate data for that data group and identifiable to that data group , wherein the data of a first data group has a different schema than the data of a second data group and the data of the first data group is mapped differently than the data of the second data group so that different lists of values are output for the corresponding different intermediate data (のプログラム) , wherein the different schema and corresponding different intermediate data have a key in common ;

and reducing the intermediate data for the data groups to at least one output data group , including processing the intermediate data for each data group in a manner that is defined to correspond to that data group , so as to result in a merging of the corresponding different intermediate data based on the key in common , wherein the mapping and reducing operations are performed by a distributed system .
JP2003006021A
CLAIM 8
【請求項8】 コンピュ (processing data) ータに、請求項4から請求項7 のいずれかに記載のデータベース管理方法における各手 順を実行させるためのプログラム (corresponding different intermediate data)

US8190610B2
CLAIM 13
. The method of claim 1 , wherein : the intermediate data processing step of the reducing step further comprises processing data (コンピュ) that is not intermediate data .
JP2003006021A
CLAIM 8
【請求項8】 コンピュ (processing data) ータに、請求項4から請求項7 のいずれかに記載のデータベース管理方法における各手 順を実行させるためのプログラム。

US8190610B2
CLAIM 17
. A computer system including a plurality of computing devices , the computer system configured to process data of a data set , wherein the data set comprises a plurality of data groups , the computer system comprises at least one processor and memory that are operable to perform the following operations : partitioning the data of each one of the data groups into a plurality of data partitions that each have a plurality of key-value pairs and providing each data partition to a selected one of a plurality of mapping functions that are each user-configurable to independently output a plurality of lists of values for each of a set of keys found in such map function' ;

s corresponding data partition to form corresponding intermediate data for that data group and identifiable to that data group , wherein the data of a first data group has a different schema than the data of a second data group and the data of the first data group is mapped differently than the data of the second data group so that different lists of values are output for the corresponding different intermediate data (のプログラム) , wherein the different schema and corresponding different intermediate data have a key in common ;

and reduce the intermediate data for the data groups to at least one output data group , including processing the intermediate data for each data group in a manner that is defined to correspond to that data group so as to result in a merging of the corresponding different intermediate data based on the key in common .
JP2003006021A
CLAIM 8
【請求項8】 コンピュータに、請求項4から請求項7 のいずれかに記載のデータベース管理方法における各手 順を実行させるためのプログラム (corresponding different intermediate data)

US8190610B2
CLAIM 29
. The computer system of claim 17 , wherein : the intermediate data processing of the reducing further comprises processing data (コンピュ) that is not intermediate data .
JP2003006021A
CLAIM 8
【請求項8】 コンピュ (processing data) ータに、請求項4から請求項7 のいずれかに記載のデータベース管理方法における各手 順を実行させるためのプログラム。

US8190610B2
CLAIM 33
. A map-reduce method of processing data (コンピュ) from a plurality of groups having different schema over a computer system , the method comprising : for a first data set having a plurality of first key-value pairs , wherein such first data set belongs to a first data group and the first key-value pairs have a first schema , partitioning the first data set into a plurality of data partitions and providing each data partition to a selected one of a plurality of mapping functions that are each user-configurable to independently output a plurality of lists of values for each of a set of keys found in such map function' ;

s corresponding data partition to form a first intermediate data set having a first set of resulting key-value pairs ;

for a second data set having a plurality of second key-value pairs , wherein such second data set belongs to a second data group and the second key-value pairs have a second schema , partitioning the first data set into a plurality of data partitions and providing each data partition to a selected one of a plurality of mapping functions to form a second intermediate data set having a second set of resulting key-value pairs differing from the first set of resulting key-value pairs , wherein the first schema differs from the second schema , wherein the first and second schema and the first and second set of resulting key-value pairs have a key in common ;

and reducing the first and the second intermediate data set together so as to form an output data set , wherein the reducing is accomplished by iterating on at least one key from the first and second intermediate data set , and the output data set has a different schema than the first and second schema , wherein the mapping and reducing operations are performed by a distributed system .
JP2003006021A
CLAIM 8
【請求項8】 コンピュ (processing data) ータに、請求項4から請求項7 のいずれかに記載のデータベース管理方法における各手 順を実行させるためのプログラム。




US8190610B2

Filed: 2006-10-05     Issued: 2012-05-29

MapReduce for distributed database processing

(Original Assignee) Yahoo Inc     (Current Assignee) R2 Solutions LLC ; Altaba Inc

Ali Dasdan, Hung-Chih Yang, Ruey-Lung Hsiao
US7086085B1

Filed: 2001-06-20     Issued: 2006-08-01

Variable trust levels for authentication

(Original Assignee) iLumin Corp     (Current Assignee) iLumin Corp

Bruce E Brown, Aaron M Brown, II Bruce-Eric Brown
US8190610B2
CLAIM 1
. A method of processing data of a data set over a distributed system , wherein the data set comprises a plurality of data groups , the method comprising : partitioning the data of each one of the data groups into a plurality of data partitions that each have a plurality of key-value pairs and providing each data partition to a selected one of a plurality of mapping functions that are each user-configurable to independently output a plurality of lists of values for each of a set of keys found in such map function' ;

s corresponding data partition to form corresponding intermediate data for that data group and identifiable to that data group , wherein the data of a first data group has a different schema (receiving input) than the data of a second data group and the data of the first data group is mapped differently than the data of the second data group so that different lists of values are output for the corresponding different intermediate data , wherein the different schema and corresponding different intermediate data have a key in common ;

and reducing the intermediate data for the data groups to at least one output data group , including processing the intermediate data for each data group in a manner that is defined to correspond to that data group , so as to result in a merging of the corresponding different intermediate data based on the key in common , wherein the mapping and reducing operations are performed by a distributed system .
US7086085B1
CLAIM 4
. The method of claim 3 , further comprising : receiving input (different schema) specifying one of the presented actions ;
and initiating the specified action .

US8190610B2
CLAIM 12
. The method of claim 5 , wherein : the intermediate data includes a plurality of grouped sets of key/value pairs ;

the reducing step is carried out by a plurality of reducers ;

for at least one of the reducers , the iterator corresponding to a particular data group , for that reducer , operates according to a different key (performing authentication) of a different schema (receiving input) than the iterator corresponding to another particular data group , for that reducer .
US7086085B1
CLAIM 1
. A method for determining a level of trust in an authenticated identification , comprising : performing authentication (different key) s to obtain authentication results , each authentication having a score , each result indicating whether the corresponding authentication is successful ;
combining the scores for the successful authentications to determine a level of trust ;
responsive to the determined level of trust exceeding a first predetermined threshold , allowing a first level of access to a resource ;
responsive to the determined level of trust exceeding a second predetermined threshold , allowing a second level of access to a resource ;
and wherein the first level of access comprises reading the resource and the second level of access comprises modifying the resource .

US7086085B1
CLAIM 4
. The method of claim 3 , further comprising : receiving input (different schema) specifying one of the presented actions ;
and initiating the specified action .

US8190610B2
CLAIM 17
. A computer system including a plurality of computing devices , the computer system configured to process data of a data set , wherein the data set comprises a plurality of data groups , the computer system comprises at least one processor and memory that are operable to perform the following operations : partitioning the data of each one of the data groups into a plurality of data partitions that each have a plurality of key-value pairs and providing each data partition to a selected one of a plurality of mapping functions that are each user-configurable to independently output a plurality of lists of values for each of a set of keys found in such map function' ;

s corresponding data partition to form corresponding intermediate data for that data group and identifiable to that data group , wherein the data of a first data group has a different schema (receiving input) than the data of a second data group and the data of the first data group is mapped differently than the data of the second data group so that different lists of values are output for the corresponding different intermediate data , wherein the different schema and corresponding different intermediate data have a key in common ;

and reduce the intermediate data for the data groups to at least one output data group , including processing the intermediate data for each data group in a manner that is defined to correspond to that data group so as to result in a merging of the corresponding different intermediate data based on the key in common .
US7086085B1
CLAIM 4
. The method of claim 3 , further comprising : receiving input (different schema) specifying one of the presented actions ;
and initiating the specified action .

US8190610B2
CLAIM 22
. The computer system of claim 21 , wherein : the intermediate data includes a plurality of grouped sets of key/value pairs ;

the at least one processor and memory are further operable to reduce the intermediate data via a plurality of reducers ;

for at least one of the reducers , the iterator corresponding to a particular data group , for that reducer , is configured to operate according to a different key (performing authentication) of a different schema (receiving input) than the iterator corresponding to another particular data group , for that reducer .
US7086085B1
CLAIM 1
. A method for determining a level of trust in an authenticated identification , comprising : performing authentication (different key) s to obtain authentication results , each authentication having a score , each result indicating whether the corresponding authentication is successful ;
combining the scores for the successful authentications to determine a level of trust ;
responsive to the determined level of trust exceeding a first predetermined threshold , allowing a first level of access to a resource ;
responsive to the determined level of trust exceeding a second predetermined threshold , allowing a second level of access to a resource ;
and wherein the first level of access comprises reading the resource and the second level of access comprises modifying the resource .

US7086085B1
CLAIM 4
. The method of claim 3 , further comprising : receiving input (different schema) specifying one of the presented actions ;
and initiating the specified action .

US8190610B2
CLAIM 33
. A map-reduce method of processing data from a plurality of groups having different schema (receiving input) over a computer system , the method comprising : for a first data set having a plurality of first key-value pairs , wherein such first data set belongs to a first data group and the first key-value pairs have a first schema , partitioning the first data set into a plurality of data partitions and providing each data partition to a selected one of a plurality of mapping functions that are each user-configurable to independently output a plurality of lists of values for each of a set of keys found in such map function' ;

s corresponding data partition to form a first intermediate data set having a first set of resulting key-value pairs ;

for a second data set having a plurality of second key-value pairs , wherein such second data set belongs to a second data group and the second key-value pairs have a second schema , partitioning the first data set into a plurality of data partitions and providing each data partition to a selected one of a plurality of mapping functions to form a second intermediate data set having a second set of resulting key-value pairs differing from the first set of resulting key-value pairs , wherein the first schema differs from the second schema , wherein the first and second schema and the first and second set of resulting key-value pairs have a key in common ;

and reducing the first and the second intermediate data set together so as to form an output data set , wherein the reducing is accomplished by iterating on at least one key from the first and second intermediate data set , and the output data set has a different schema than the first and second schema , wherein the mapping and reducing operations are performed by a distributed system .
US7086085B1
CLAIM 4
. The method of claim 3 , further comprising : receiving input (different schema) specifying one of the presented actions ;
and initiating the specified action .

US8190610B2
CLAIM 40
. A computer system including a plurality of computing devices , the computer system configured to process data of a data set , wherein the data set comprises a plurality of data groups , the computer system comprises at least one processor and memory that are operable to perform the following operations : for a first data set having a plurality of first key-value pairs , wherein such first data set belongs to a first data group and the first key-value pairs have a first schema , partitioning the first data set into a plurality of data partitions and providing each data partition to a selected one of a plurality of mapping functions that are each user-configurable to independently output a plurality of lists of values for each of a set of keys found in such map function' ;

s corresponding data partition to form a first intermediate data set having a first set of resulting key-value pairs ;

for a second data set having a plurality of second key-value pairs , wherein such second data set belongs to a second data group and the second key-value pairs have a second schema , partitioning the first data set into a plurality of data partitions and providing each data partition to a selected one of a plurality of mapping functions to form a second intermediate data set having a second set of resulting key-value pairs differing from the first set of resulting key-value pairs , wherein the first schema differs from the second schema ;

and reducing the first and the second intermediate data set together so as to form an output data set , wherein the reducing is accomplished by iterating on at least one key from the first and second intermediate data set , and the output data set has a different schema (receiving input) than the first and second schema , wherein the first and second schema and the first and second set of resulting key-value pairs have a key in common .
US7086085B1
CLAIM 4
. The method of claim 3 , further comprising : receiving input (different schema) specifying one of the presented actions ;
and initiating the specified action .




US8190610B2

Filed: 2006-10-05     Issued: 2012-05-29

MapReduce for distributed database processing

(Original Assignee) Yahoo Inc     (Current Assignee) R2 Solutions LLC ; Altaba Inc

Ali Dasdan, Hung-Chih Yang, Ruey-Lung Hsiao
US6816854B2

Filed: 2001-06-08     Issued: 2004-11-09

Method and apparatus for database query decomposition

(Original Assignee) Sun Microsystems Inc     (Current Assignee) Sun Microsystems Inc

David Reiner, Jeffrey M. Miller, David C. Wheat
US8190610B2
CLAIM 12
. The method of claim 5 , wherein : the intermediate data includes a plurality of grouped sets of key/value pairs ;

the reducing step is carried out by a plurality of reducers ;

for at least one of the reducers , the iterator corresponding to a particular data (database queries) group , for that reducer , operates according to a different key of a different schema than the iterator corresponding to another particular data group , for that reducer .
US6816854B2
CLAIM 1
. A database query system configured for use with a database management system , said database management system including a standard interface configured to receive database queries (particular data) , the query system comprising : a parallel interface configured to receive a first database query ;
and a query decomposer configured to : detect a decomposition directive corresponding to the received database query ;
generate a plurality of subqueries from the received first database query , wherein said subqueries correspond to said directive ;
and convey said plurality of subqueries in parallel to the standard interface of the database management system ;
wherein said directive is embedded within a comment .

US8190610B2
CLAIM 17
. A computer system (computer system) including a plurality of computing devices , the computer system configured to process data of a data set , wherein the data set comprises a plurality of data groups , the computer system comprises at least one processor and memory that are operable to perform the following operations : partitioning the data of each one of the data groups into a plurality of data partitions that each have a plurality of key-value pairs and providing each data partition to a selected one of a plurality of mapping functions that are each user-configurable to independently output a plurality of lists of values for each of a set of keys found in such map function' ;

s corresponding data partition to form corresponding intermediate data for that data group and identifiable to that data group , wherein the data of a first data group has a different schema than the data of a second data group and the data of the first data group is mapped differently than the data of the second data group so that different lists of values are output for the corresponding different intermediate data , wherein the different schema and corresponding different intermediate data have a key in common ;

and reduce the intermediate data for the data groups to at least one output data group , including processing the intermediate data for each data group in a manner that is defined to correspond to that data group so as to result in a merging of the corresponding different intermediate data based on the key in common .
US6816854B2
CLAIM 9
. A computer system (computer system) comprising : a processor ;
a storage element ;
and a database system configured to : receive a first database query ;
detect a decomposition directive corresponding to the received database query ;
generate a plurality of subqueries from the received first database query , wherein said subqueries correspond to said directive ;
and convey said plurality of subqueries in parallel to a database management system ;
wherein said directive is embedded within a comment .

US8190610B2
CLAIM 18
. The computer system (computer system) of claim 17 , wherein : the at least one output data group is a plurality of output data groups .
US6816854B2
CLAIM 9
. A computer system (computer system) comprising : a processor ;
a storage element ;
and a database system configured to : receive a first database query ;
detect a decomposition directive corresponding to the received database query ;
generate a plurality of subqueries from the received first database query , wherein said subqueries correspond to said directive ;
and convey said plurality of subqueries in parallel to a database management system ;
wherein said directive is embedded within a comment .

US8190610B2
CLAIM 19
. The computer system (computer system) of claim 17 , wherein : corresponding intermediate data for a data group being identifiable to the data group includes the corresponding intermediate data being stored such that metadata associated with the manner of storage is identifiable to the data group .
US6816854B2
CLAIM 9
. A computer system (computer system) comprising : a processor ;
a storage element ;
and a database system configured to : receive a first database query ;
detect a decomposition directive corresponding to the received database query ;
generate a plurality of subqueries from the received first database query , wherein said subqueries correspond to said directive ;
and convey said plurality of subqueries in parallel to a database management system ;
wherein said directive is embedded within a comment .

US8190610B2
CLAIM 20
. The computer system (computer system) of claim 19 , wherein : corresponding intermediate data for a data group being identifiable to the data group includes the computer system being configured to store the corresponding intermediate data in association with data that is identifiable to the data group .
US6816854B2
CLAIM 9
. A computer system (computer system) comprising : a processor ;
a storage element ;
and a database system configured to : receive a first database query ;
detect a decomposition directive corresponding to the received database query ;
generate a plurality of subqueries from the received first database query , wherein said subqueries correspond to said directive ;
and convey said plurality of subqueries in parallel to a database management system ;
wherein said directive is embedded within a comment .

US8190610B2
CLAIM 21
. The computer system (computer system) of claim 17 , wherein : the at least one processor and memory being configured to process the intermediate data for each data group in a manner that is defined to correspond to that data group includes , for each data group , the at least one processor and memory being configured to employ an iterator that corresponds to that data group .
US6816854B2
CLAIM 9
. A computer system (computer system) comprising : a processor ;
a storage element ;
and a database system configured to : receive a first database query ;
detect a decomposition directive corresponding to the received database query ;
generate a plurality of subqueries from the received first database query , wherein said subqueries correspond to said directive ;
and convey said plurality of subqueries in parallel to a database management system ;
wherein said directive is embedded within a comment .

US8190610B2
CLAIM 22
. The computer system (computer system) of claim 21 , wherein : the intermediate data includes a plurality of grouped sets of key/value pairs ;

the at least one processor and memory are further operable to reduce the intermediate data via a plurality of reducers ;

for at least one of the reducers , the iterator corresponding to a particular data (database queries) group , for that reducer , is configured to operate according to a different key of a different schema than the iterator corresponding to another particular data group , for that reducer .
US6816854B2
CLAIM 1
. A database query system configured for use with a database management system , said database management system including a standard interface configured to receive database queries (particular data) , the query system comprising : a parallel interface configured to receive a first database query ;
and a query decomposer configured to : detect a decomposition directive corresponding to the received database query ;
generate a plurality of subqueries from the received first database query , wherein said subqueries correspond to said directive ;
and convey said plurality of subqueries in parallel to the standard interface of the database management system ;
wherein said directive is embedded within a comment .

US6816854B2
CLAIM 9
. A computer system (computer system) comprising : a processor ;
a storage element ;
and a database system configured to : receive a first database query ;
detect a decomposition directive corresponding to the received database query ;
generate a plurality of subqueries from the received first database query , wherein said subqueries correspond to said directive ;
and convey said plurality of subqueries in parallel to a database management system ;
wherein said directive is embedded within a comment .

US8190610B2
CLAIM 23
. The computer system (computer system) of claim 17 , wherein : the intermediate data includes a plurality of grouped sets of key/value pairs ;

the at least one processor and memory are further operable to reduce the intermediate data via a plurality of reducers ;

and the at least one processor and memory are further operable to : partition the intermediate data into a plurality of partitions , to provide each key/value pair of the intermediate data to a separate one of the partitions ;

and provide the intermediate data of each partition to a separate one of the reducers .
US6816854B2
CLAIM 9
. A computer system (computer system) comprising : a processor ;
a storage element ;
and a database system configured to : receive a first database query ;
detect a decomposition directive corresponding to the received database query ;
generate a plurality of subqueries from the received first database query , wherein said subqueries correspond to said directive ;
and convey said plurality of subqueries in parallel to a database management system ;
wherein said directive is embedded within a comment .

US8190610B2
CLAIM 24
. The computer system (computer system) of claim 17 , wherein : the intermediate data includes a plurality of grouped sets of key/value pairs ;

the at least one processor and memory are further operable to reduce the intermediate data via a plurality of reducers ;

and the at least one processor and memory are further operable to : partition the intermediate data into a plurality of partitions , to provide at least some of the key/value pairs of the intermediate data to more than one of the partitions ;

and provide the intermediate data of each partition to a separate one of the reducers .
US6816854B2
CLAIM 9
. A computer system (computer system) comprising : a processor ;
a storage element ;
and a database system configured to : receive a first database query ;
detect a decomposition directive corresponding to the received database query ;
generate a plurality of subqueries from the received first database query , wherein said subqueries correspond to said directive ;
and convey said plurality of subqueries in parallel to a database management system ;
wherein said directive is embedded within a comment .

US8190610B2
CLAIM 25
. The computer system (computer system) of claim 24 , wherein : the at least one processor and memory are further operable to , in partitioning , to provide all of the key/value pairs of the intermediate data to all of the partitions .
US6816854B2
CLAIM 9
. A computer system (computer system) comprising : a processor ;
a storage element ;
and a database system configured to : receive a first database query ;
detect a decomposition directive corresponding to the received database query ;
generate a plurality of subqueries from the received first database query , wherein said subqueries correspond to said directive ;
and convey said plurality of subqueries in parallel to a database management system ;
wherein said directive is embedded within a comment .

US8190610B2
CLAIM 26
. The computer system (computer system) of claim 24 , wherein : at least some of the reducers include a sort , group-by-key and combine task ;

the at least one processor and memory are further operable to generate and provide metadata for at least some of the mapping , partitioning , combining , grouping and sorting .
US6816854B2
CLAIM 9
. A computer system (computer system) comprising : a processor ;
a storage element ;
and a database system configured to : receive a first database query ;
detect a decomposition directive corresponding to the received database query ;
generate a plurality of subqueries from the received first database query , wherein said subqueries correspond to said directive ;
and convey said plurality of subqueries in parallel to a database management system ;
wherein said directive is embedded within a comment .

US8190610B2
CLAIM 27
. The computer system (computer system) of claim 26 , wherein : the reducing includes processing the metadata .
US6816854B2
CLAIM 9
. A computer system (computer system) comprising : a processor ;
a storage element ;
and a database system configured to : receive a first database query ;
detect a decomposition directive corresponding to the received database query ;
generate a plurality of subqueries from the received first database query , wherein said subqueries correspond to said directive ;
and convey said plurality of subqueries in parallel to a database management system ;
wherein said directive is embedded within a comment .

US8190610B2
CLAIM 28
. The computer system (computer system) of claim 27 , wherein : the at least one processor and memory being further operable to process the intermediate data for each data group in a manner that is defined to correspond to that data group includes , for each data group , the at least one processor and memory being further operable to employ an iterator that corresponds to that data group , wherein the iterator is configured to provide the associated metadata to the reducing .
US6816854B2
CLAIM 9
. A computer system (computer system) comprising : a processor ;
a storage element ;
and a database system configured to : receive a first database query ;
detect a decomposition directive corresponding to the received database query ;
generate a plurality of subqueries from the received first database query , wherein said subqueries correspond to said directive ;
and convey said plurality of subqueries in parallel to a database management system ;
wherein said directive is embedded within a comment .

US8190610B2
CLAIM 29
. The computer system (computer system) of claim 17 , wherein : the intermediate data processing of the reducing further comprises processing data that is not intermediate data .
US6816854B2
CLAIM 9
. A computer system (computer system) comprising : a processor ;
a storage element ;
and a database system configured to : receive a first database query ;
detect a decomposition directive corresponding to the received database query ;
generate a plurality of subqueries from the received first database query , wherein said subqueries correspond to said directive ;
and convey said plurality of subqueries in parallel to a database management system ;
wherein said directive is embedded within a comment .

US8190610B2
CLAIM 30
. The computer system (computer system) of claim 29 , wherein : the at least one processor and memory are further operable to reduce the intermediate data via a plurality of reducers ;

and the data that is not intermediate data , for a particular reducer , includes data that is associated with another reducer .
US6816854B2
CLAIM 9
. A computer system (computer system) comprising : a processor ;
a storage element ;
and a database system configured to : receive a first database query ;
detect a decomposition directive corresponding to the received database query ;
generate a plurality of subqueries from the received first database query , wherein said subqueries correspond to said directive ;
and convey said plurality of subqueries in parallel to a database management system ;
wherein said directive is embedded within a comment .

US8190610B2
CLAIM 31
. The computer system (computer system) of claim 29 , wherein : the at least one processor and memory are further operable to reduce the intermediate data via a plurality of reducers ;

and the data that is not intermediate data , for a particular reducer , includes data that is associated with that reducer .
US6816854B2
CLAIM 9
. A computer system (computer system) comprising : a processor ;
a storage element ;
and a database system configured to : receive a first database query ;
detect a decomposition directive corresponding to the received database query ;
generate a plurality of subqueries from the received first database query , wherein said subqueries correspond to said directive ;
and convey said plurality of subqueries in parallel to a database management system ;
wherein said directive is embedded within a comment .

US8190610B2
CLAIM 32
. The computer system (computer system) of claim 17 , wherein the at least one processor and memory are further operable to reduce by relating the data among the plurality of data groups .
US6816854B2
CLAIM 9
. A computer system (computer system) comprising : a processor ;
a storage element ;
and a database system configured to : receive a first database query ;
detect a decomposition directive corresponding to the received database query ;
generate a plurality of subqueries from the received first database query , wherein said subqueries correspond to said directive ;
and convey said plurality of subqueries in parallel to a database management system ;
wherein said directive is embedded within a comment .

US8190610B2
CLAIM 33
. A map-reduce method of processing data from a plurality of groups having different schema over a computer system (computer system) , the method comprising : for a first data set having a plurality of first key-value pairs , wherein such first data set belongs to a first data group and the first key-value pairs have a first schema , partitioning the first data set into a plurality of data partitions and providing each data partition to a selected one of a plurality of mapping functions that are each user-configurable to independently output a plurality of lists of values for each of a set of keys found in such map function' ;

s corresponding data partition to form a first intermediate data set having a first set of resulting key-value pairs ;

for a second data set having a plurality of second key-value pairs , wherein such second data set belongs to a second data group and the second key-value pairs have a second schema , partitioning the first data set into a plurality of data partitions and providing each data partition to a selected one of a plurality of mapping functions to form a second intermediate data set having a second set of resulting key-value pairs differing from the first set of resulting key-value pairs , wherein the first schema differs from the second schema , wherein the first and second schema and the first and second set of resulting key-value pairs have a key in common ;

and reducing the first and the second intermediate data set together so as to form an output data set , wherein the reducing is accomplished by iterating on at least one key from the first and second intermediate data set , and the output data set has a different schema than the first and second schema , wherein the mapping and reducing operations are performed by a distributed system .
US6816854B2
CLAIM 9
. A computer system (computer system) comprising : a processor ;
a storage element ;
and a database system configured to : receive a first database query ;
detect a decomposition directive corresponding to the received database query ;
generate a plurality of subqueries from the received first database query , wherein said subqueries correspond to said directive ;
and convey said plurality of subqueries in parallel to a database management system ;
wherein said directive is embedded within a comment .

US8190610B2
CLAIM 40
. A computer system (computer system) including a plurality of computing devices , the computer system configured to process data of a data set , wherein the data set comprises a plurality of data groups , the computer system comprises at least one processor and memory that are operable to perform the following operations : for a first data set having a plurality of first key-value pairs , wherein such first data set belongs to a first data group and the first key-value pairs have a first schema , partitioning the first data set into a plurality of data partitions and providing each data partition to a selected one of a plurality of mapping functions that are each user-configurable to independently output a plurality of lists of values for each of a set of keys found in such map function' ;

s corresponding data partition to form a first intermediate data set having a first set of resulting key-value pairs ;

for a second data set having a plurality of second key-value pairs , wherein such second data set belongs to a second data group and the second key-value pairs have a second schema , partitioning the first data set into a plurality of data partitions and providing each data partition to a selected one of a plurality of mapping functions to form a second intermediate data set having a second set of resulting key-value pairs differing from the first set of resulting key-value pairs , wherein the first schema differs from the second schema ;

and reducing the first and the second intermediate data set together so as to form an output data set , wherein the reducing is accomplished by iterating on at least one key from the first and second intermediate data set , and the output data set has a different schema than the first and second schema , wherein the first and second schema and the first and second set of resulting key-value pairs have a key in common .
US6816854B2
CLAIM 9
. A computer system (computer system) comprising : a processor ;
a storage element ;
and a database system configured to : receive a first database query ;
detect a decomposition directive corresponding to the received database query ;
generate a plurality of subqueries from the received first database query , wherein said subqueries correspond to said directive ;
and convey said plurality of subqueries in parallel to a database management system ;
wherein said directive is embedded within a comment .

US8190610B2
CLAIM 41
. The computer system (computer system) of claim 40 , wherein the reducing is accomplished by iterating on at least one key from each of the first and second intermediate data set so that the output data set is a merging of a portion of the first and second intermediate data set .
US6816854B2
CLAIM 9
. A computer system (computer system) comprising : a processor ;
a storage element ;
and a database system configured to : receive a first database query ;
detect a decomposition directive corresponding to the received database query ;
generate a plurality of subqueries from the received first database query , wherein said subqueries correspond to said directive ;
and convey said plurality of subqueries in parallel to a database management system ;
wherein said directive is embedded within a comment .

US8190610B2
CLAIM 42
. The computer system (computer system) of claim 40 , wherein the reducing is accomplished by a plurality of reducers , the d the at least one processor and memory further operable for : partitioning the first or second intermediate data set into a plurality of partitions , at least some of the key-value pairs of the first or second intermediate data being provided to more than one of the partitions ;

and providing the partitioned intermediate data set of each partition to a separate one of the reducers .
US6816854B2
CLAIM 9
. A computer system (computer system) comprising : a processor ;
a storage element ;
and a database system configured to : receive a first database query ;
detect a decomposition directive corresponding to the received database query ;
generate a plurality of subqueries from the received first database query , wherein said subqueries correspond to said directive ;
and convey said plurality of subqueries in parallel to a database management system ;
wherein said directive is embedded within a comment .

US8190610B2
CLAIM 43
. The computer system (computer system) of claim 42 , wherein all of the key-value pairs of the first and second intermediate data set are provided to all of the reducers .
US6816854B2
CLAIM 9
. A computer system (computer system) comprising : a processor ;
a storage element ;
and a database system configured to : receive a first database query ;
detect a decomposition directive corresponding to the received database query ;
generate a plurality of subqueries from the received first database query , wherein said subqueries correspond to said directive ;
and convey said plurality of subqueries in parallel to a database management system ;
wherein said directive is embedded within a comment .

US8190610B2
CLAIM 44
. The computer system (computer system) of claim 42 , wherein at least some of the reducers include a sort , group-by-key and combine task , the d the at least one processor and memory are further operable for : generating and providing metadata for at least some of the mapping , partitioning , combining , grouping and sorting .
US6816854B2
CLAIM 9
. A computer system (computer system) comprising : a processor ;
a storage element ;
and a database system configured to : receive a first database query ;
detect a decomposition directive corresponding to the received database query ;
generate a plurality of subqueries from the received first database query , wherein said subqueries correspond to said directive ;
and convey said plurality of subqueries in parallel to a database management system ;
wherein said directive is embedded within a comment .

US8190610B2
CLAIM 45
. The computer system (computer system) of claim 44 , wherein the reducing includes processing the metadata .
US6816854B2
CLAIM 9
. A computer system (computer system) comprising : a processor ;
a storage element ;
and a database system configured to : receive a first database query ;
detect a decomposition directive corresponding to the received database query ;
generate a plurality of subqueries from the received first database query , wherein said subqueries correspond to said directive ;
and convey said plurality of subqueries in parallel to a database management system ;
wherein said directive is embedded within a comment .

US8190610B2
CLAIM 46
. The computer system (computer system) of claim 45 , wherein iterating includes providing the associated metadata to the processing of the reducing step .
US6816854B2
CLAIM 9
. A computer system (computer system) comprising : a processor ;
a storage element ;
and a database system configured to : receive a first database query ;
detect a decomposition directive corresponding to the received database query ;
generate a plurality of subqueries from the received first database query , wherein said subqueries correspond to said directive ;
and convey said plurality of subqueries in parallel to a database management system ;
wherein said directive is embedded within a comment .




US8190610B2

Filed: 2006-10-05     Issued: 2012-05-29

MapReduce for distributed database processing

(Original Assignee) Yahoo Inc     (Current Assignee) R2 Solutions LLC ; Altaba Inc

Ali Dasdan, Hung-Chih Yang, Ruey-Lung Hsiao
US6768986B2

Filed: 2001-04-03     Issued: 2004-07-27

Mapping of an RDBMS schema onto a multidimensional data model

(Original Assignee) SAP France SA     (Current Assignee) Business Objects Software Ltd

Jean-Yves Cras, Henri Biestro, Ricardo Polo-Malouvier
US8190610B2
CLAIM 1
. A method of processing data of a data set over a distributed system , wherein the data set comprises a plurality of data groups , the method comprising : partitioning the data of each one of the data groups into a plurality of data partitions that each have a plurality of key-value pairs and providing each data partition to a selected one of a plurality of mapping functions that are each user-configurable to independently output a plurality of lists of values for each of a set of keys found in such map function' ;

s corresponding data partition to form corresponding intermediate data for that data group and identifiable to that data group , wherein the data of a first data (repeating step) group has a different schema (data model) than the data of a second data group (repeating step) and the data of the first data group is mapped differently than the data of the second data group so that different lists of values are output for the corresponding different intermediate data , wherein the different schema and corresponding different intermediate data have a key in common ;

and reducing the intermediate data for the data groups to at least one output data group , including processing the intermediate data for each data group in a manner that is defined to correspond to that data group , so as to result in a merging of the corresponding different intermediate data based on the key in common , wherein the mapping and reducing operations are performed by a distributed system .
US6768986B2
CLAIM 1
. A method of translating a relational model defined by a relational table into a multi-dimensional data model (different schema) , the method comprising the steps of : (a) if the relational table is not normalized , creating a normalized table from the relational table and defining a relationship between the relational table and the normalized table , and if the relational table is normalized , referring to the relational table as the normalized table ;
(b) transforming the normalized table into an OLAP model ;
and (c) prior to step (a) , if the relational table is normalized , but not by dependence between columns , redefining the relational table by a foreign key .

US6768986B2
CLAIM 28
. A method as in claim 27 , further comprising the step of : recursively repeating step (first data, first data group, second data group) s (j) through (m) .

US8190610B2
CLAIM 12
. The method of claim 5 , wherein : the intermediate data includes a plurality of grouped sets of key/value pairs ;

the reducing step is carried out by a plurality of reducers ;

for at least one of the reducers , the iterator corresponding to a particular data group , for that reducer , operates according to a different key of a different schema (data model) than the iterator corresponding to another particular data group , for that reducer .
US6768986B2
CLAIM 1
. A method of translating a relational model defined by a relational table into a multi-dimensional data model (different schema) , the method comprising the steps of : (a) if the relational table is not normalized , creating a normalized table from the relational table and defining a relationship between the relational table and the normalized table , and if the relational table is normalized , referring to the relational table as the normalized table ;
(b) transforming the normalized table into an OLAP model ;
and (c) prior to step (a) , if the relational table is normalized , but not by dependence between columns , redefining the relational table by a foreign key .

US8190610B2
CLAIM 17
. A computer system (greatest number) including a plurality of computing devices , the computer system configured to process data of a data set , wherein the data set comprises a plurality of data groups , the computer system comprises at least one processor and memory that are operable to perform the following operations : partitioning the data of each one of the data groups into a plurality of data partitions that each have a plurality of key-value pairs and providing each data partition to a selected one of a plurality of mapping functions that are each user-configurable to independently output a plurality of lists of values for each of a set of keys found in such map function' ;

s corresponding data partition to form corresponding intermediate data for that data group and identifiable to that data group , wherein the data of a first data (repeating step) group has a different schema (data model) than the data of a second data group (repeating step) and the data of the first data group is mapped differently than the data of the second data group so that different lists of values are output for the corresponding different intermediate data , wherein the different schema and corresponding different intermediate data have a key in common ;

and reduce the intermediate data for the data groups to at least one output data group , including processing the intermediate data for each data group in a manner that is defined to correspond to that data group so as to result in a merging of the corresponding different intermediate data based on the key in common .
US6768986B2
CLAIM 1
. A method of translating a relational model defined by a relational table into a multi-dimensional data model (different schema) , the method comprising the steps of : (a) if the relational table is not normalized , creating a normalized table from the relational table and defining a relationship between the relational table and the normalized table , and if the relational table is normalized , referring to the relational table as the normalized table ;
(b) transforming the normalized table into an OLAP model ;
and (c) prior to step (a) , if the relational table is normalized , but not by dependence between columns , redefining the relational table by a foreign key .

US6768986B2
CLAIM 28
. A method as in claim 27 , further comprising the step of : recursively repeating step (first data, first data group, second data group) s (j) through (m) .

US6768986B2
CLAIM 31
. A method as in claim 30 , further comprising the step of : (f) checking a distribution of groups on the list TOP along groups in the list ORD , using a column with a greatest number (computer system) of distinct values in each group .

US8190610B2
CLAIM 18
. The computer system (greatest number) of claim 17 , wherein : the at least one output data group is a plurality of output data groups .
US6768986B2
CLAIM 31
. A method as in claim 30 , further comprising the step of : (f) checking a distribution of groups on the list TOP along groups in the list ORD , using a column with a greatest number (computer system) of distinct values in each group .

US8190610B2
CLAIM 19
. The computer system (greatest number) of claim 17 , wherein : corresponding intermediate data for a data group being identifiable to the data group includes the corresponding intermediate data being stored such that metadata associated with the manner of storage is identifiable to the data group .
US6768986B2
CLAIM 31
. A method as in claim 30 , further comprising the step of : (f) checking a distribution of groups on the list TOP along groups in the list ORD , using a column with a greatest number (computer system) of distinct values in each group .

US8190610B2
CLAIM 20
. The computer system (greatest number) of claim 19 , wherein : corresponding intermediate data for a data group being identifiable to the data group includes the computer system being configured to store the corresponding intermediate data in association with data that is identifiable to the data group .
US6768986B2
CLAIM 31
. A method as in claim 30 , further comprising the step of : (f) checking a distribution of groups on the list TOP along groups in the list ORD , using a column with a greatest number (computer system) of distinct values in each group .

US8190610B2
CLAIM 21
. The computer system (greatest number) of claim 17 , wherein : the at least one processor and memory being configured to process the intermediate data for each data group in a manner that is defined to correspond to that data group includes , for each data group , the at least one processor and memory being configured to employ an iterator that corresponds to that data group .
US6768986B2
CLAIM 31
. A method as in claim 30 , further comprising the step of : (f) checking a distribution of groups on the list TOP along groups in the list ORD , using a column with a greatest number (computer system) of distinct values in each group .

US8190610B2
CLAIM 22
. The computer system (greatest number) of claim 21 , wherein : the intermediate data includes a plurality of grouped sets of key/value pairs ;

the at least one processor and memory are further operable to reduce the intermediate data via a plurality of reducers ;

for at least one of the reducers , the iterator corresponding to a particular data group , for that reducer , is configured to operate according to a different key of a different schema (data model) than the iterator corresponding to another particular data group , for that reducer .
US6768986B2
CLAIM 1
. A method of translating a relational model defined by a relational table into a multi-dimensional data model (different schema) , the method comprising the steps of : (a) if the relational table is not normalized , creating a normalized table from the relational table and defining a relationship between the relational table and the normalized table , and if the relational table is normalized , referring to the relational table as the normalized table ;
(b) transforming the normalized table into an OLAP model ;
and (c) prior to step (a) , if the relational table is normalized , but not by dependence between columns , redefining the relational table by a foreign key .

US6768986B2
CLAIM 31
. A method as in claim 30 , further comprising the step of : (f) checking a distribution of groups on the list TOP along groups in the list ORD , using a column with a greatest number (computer system) of distinct values in each group .

US8190610B2
CLAIM 23
. The computer system (greatest number) of claim 17 , wherein : the intermediate data includes a plurality of grouped sets of key/value pairs ;

the at least one processor and memory are further operable to reduce the intermediate data via a plurality of reducers ;

and the at least one processor and memory are further operable to : partition the intermediate data into a plurality of partitions , to provide each key/value pair of the intermediate data to a separate one of the partitions ;

and provide the intermediate data of each partition to a separate one of the reducers .
US6768986B2
CLAIM 31
. A method as in claim 30 , further comprising the step of : (f) checking a distribution of groups on the list TOP along groups in the list ORD , using a column with a greatest number (computer system) of distinct values in each group .

US8190610B2
CLAIM 24
. The computer system (greatest number) of claim 17 , wherein : the intermediate data includes a plurality of grouped sets of key/value pairs ;

the at least one processor and memory are further operable to reduce the intermediate data via a plurality of reducers ;

and the at least one processor and memory are further operable to : partition the intermediate data into a plurality of partitions , to provide at least some of the key/value pairs of the intermediate data to more than one of the partitions ;

and provide the intermediate data of each partition to a separate one of the reducers .
US6768986B2
CLAIM 31
. A method as in claim 30 , further comprising the step of : (f) checking a distribution of groups on the list TOP along groups in the list ORD , using a column with a greatest number (computer system) of distinct values in each group .

US8190610B2
CLAIM 25
. The computer system (greatest number) of claim 24 , wherein : the at least one processor and memory are further operable to , in partitioning , to provide all of the key/value pairs of the intermediate data to all of the partitions .
US6768986B2
CLAIM 31
. A method as in claim 30 , further comprising the step of : (f) checking a distribution of groups on the list TOP along groups in the list ORD , using a column with a greatest number (computer system) of distinct values in each group .

US8190610B2
CLAIM 26
. The computer system (greatest number) of claim 24 , wherein : at least some of the reducers include a sort , group-by-key and combine task ;

the at least one processor and memory are further operable to generate and provide metadata for at least some of the mapping , partitioning , combining , grouping and sorting .
US6768986B2
CLAIM 31
. A method as in claim 30 , further comprising the step of : (f) checking a distribution of groups on the list TOP along groups in the list ORD , using a column with a greatest number (computer system) of distinct values in each group .

US8190610B2
CLAIM 27
. The computer system (greatest number) of claim 26 , wherein : the reducing includes processing the metadata .
US6768986B2
CLAIM 31
. A method as in claim 30 , further comprising the step of : (f) checking a distribution of groups on the list TOP along groups in the list ORD , using a column with a greatest number (computer system) of distinct values in each group .

US8190610B2
CLAIM 28
. The computer system (greatest number) of claim 27 , wherein : the at least one processor and memory being further operable to process the intermediate data for each data group in a manner that is defined to correspond to that data group includes , for each data group , the at least one processor and memory being further operable to employ an iterator that corresponds to that data group , wherein the iterator is configured to provide the associated metadata to the reducing .
US6768986B2
CLAIM 31
. A method as in claim 30 , further comprising the step of : (f) checking a distribution of groups on the list TOP along groups in the list ORD , using a column with a greatest number (computer system) of distinct values in each group .

US8190610B2
CLAIM 29
. The computer system (greatest number) of claim 17 , wherein : the intermediate data processing of the reducing further comprises processing data that is not intermediate data .
US6768986B2
CLAIM 31
. A method as in claim 30 , further comprising the step of : (f) checking a distribution of groups on the list TOP along groups in the list ORD , using a column with a greatest number (computer system) of distinct values in each group .

US8190610B2
CLAIM 30
. The computer system (greatest number) of claim 29 , wherein : the at least one processor and memory are further operable to reduce the intermediate data via a plurality of reducers ;

and the data that is not intermediate data , for a particular reducer , includes data that is associated with another reducer .
US6768986B2
CLAIM 31
. A method as in claim 30 , further comprising the step of : (f) checking a distribution of groups on the list TOP along groups in the list ORD , using a column with a greatest number (computer system) of distinct values in each group .

US8190610B2
CLAIM 31
. The computer system (greatest number) of claim 29 , wherein : the at least one processor and memory are further operable to reduce the intermediate data via a plurality of reducers ;

and the data that is not intermediate data , for a particular reducer , includes data that is associated with that reducer .
US6768986B2
CLAIM 31
. A method as in claim 30 , further comprising the step of : (f) checking a distribution of groups on the list TOP along groups in the list ORD , using a column with a greatest number (computer system) of distinct values in each group .

US8190610B2
CLAIM 32
. The computer system (greatest number) of claim 17 , wherein the at least one processor and memory are further operable to reduce by relating the data among the plurality of data groups .
US6768986B2
CLAIM 31
. A method as in claim 30 , further comprising the step of : (f) checking a distribution of groups on the list TOP along groups in the list ORD , using a column with a greatest number (computer system) of distinct values in each group .

US8190610B2
CLAIM 33
. A map-reduce method of processing data from a plurality of groups having different schema (data model) over a computer system (greatest number) , the method comprising : for a first data (repeating step) set having a plurality of first key-value pairs , wherein such first data set belongs to a first data group and the first key-value pairs have a first schema , partitioning the first data set into a plurality of data partitions and providing each data partition to a selected one of a plurality of mapping functions that are each user-configurable to independently output a plurality of lists of values for each of a set of keys found in such map function' ;

s corresponding data partition to form a first intermediate data set having a first set of resulting key-value pairs ;

for a second data set having a plurality of second key-value pairs , wherein such second data set belongs to a second data group (repeating step) and the second key-value pairs have a second schema (leaf level) , partitioning the first data set into a plurality of data partitions and providing each data partition to a selected one of a plurality of mapping functions to form a second intermediate data set having a second set of resulting key-value pairs differing from the first set of resulting key-value pairs , wherein the first schema differs from the second schema , wherein the first and second schema and the first and second set of resulting key-value pairs have a key in common ;

and reducing the first and the second intermediate data set together so as to form an output data set , wherein the reducing is accomplished by iterating on at least one key from the first and second intermediate data set , and the output data set has a different schema than the first and second schema , wherein the mapping and reducing operations are performed by a distributed system .
US6768986B2
CLAIM 1
. A method of translating a relational model defined by a relational table into a multi-dimensional data model (different schema) , the method comprising the steps of : (a) if the relational table is not normalized , creating a normalized table from the relational table and defining a relationship between the relational table and the normalized table , and if the relational table is normalized , referring to the relational table as the normalized table ;
(b) transforming the normalized table into an OLAP model ;
and (c) prior to step (a) , if the relational table is normalized , but not by dependence between columns , redefining the relational table by a foreign key .

US6768986B2
CLAIM 12
. A method as in claim 11 , further comprising the step of : (f) creating a dimension for each hierarchy that has a different leaf level (second schema) .

US6768986B2
CLAIM 28
. A method as in claim 27 , further comprising the step of : recursively repeating step (first data, first data group, second data group) s (j) through (m) .

US6768986B2
CLAIM 31
. A method as in claim 30 , further comprising the step of : (f) checking a distribution of groups on the list TOP along groups in the list ORD , using a column with a greatest number (computer system) of distinct values in each group .

US8190610B2
CLAIM 40
. A computer system (greatest number) including a plurality of computing devices , the computer system configured to process data of a data set , wherein the data set comprises a plurality of data groups , the computer system comprises at least one processor and memory that are operable to perform the following operations : for a first data (repeating step) set having a plurality of first key-value pairs , wherein such first data set belongs to a first data group and the first key-value pairs have a first schema , partitioning the first data set into a plurality of data partitions and providing each data partition to a selected one of a plurality of mapping functions that are each user-configurable to independently output a plurality of lists of values for each of a set of keys found in such map function' ;

s corresponding data partition to form a first intermediate data set having a first set of resulting key-value pairs ;

for a second data set having a plurality of second key-value pairs , wherein such second data set belongs to a second data group (repeating step) and the second key-value pairs have a second schema (leaf level) , partitioning the first data set into a plurality of data partitions and providing each data partition to a selected one of a plurality of mapping functions to form a second intermediate data set having a second set of resulting key-value pairs differing from the first set of resulting key-value pairs , wherein the first schema differs from the second schema ;

and reducing the first and the second intermediate data set together so as to form an output data set , wherein the reducing is accomplished by iterating on at least one key from the first and second intermediate data set , and the output data set has a different schema (data model) than the first and second schema , wherein the first and second schema and the first and second set of resulting key-value pairs have a key in common .
US6768986B2
CLAIM 1
. A method of translating a relational model defined by a relational table into a multi-dimensional data model (different schema) , the method comprising the steps of : (a) if the relational table is not normalized , creating a normalized table from the relational table and defining a relationship between the relational table and the normalized table , and if the relational table is normalized , referring to the relational table as the normalized table ;
(b) transforming the normalized table into an OLAP model ;
and (c) prior to step (a) , if the relational table is normalized , but not by dependence between columns , redefining the relational table by a foreign key .

US6768986B2
CLAIM 12
. A method as in claim 11 , further comprising the step of : (f) creating a dimension for each hierarchy that has a different leaf level (second schema) .

US6768986B2
CLAIM 28
. A method as in claim 27 , further comprising the step of : recursively repeating step (first data, first data group, second data group) s (j) through (m) .

US6768986B2
CLAIM 31
. A method as in claim 30 , further comprising the step of : (f) checking a distribution of groups on the list TOP along groups in the list ORD , using a column with a greatest number (computer system) of distinct values in each group .

US8190610B2
CLAIM 41
. The computer system (greatest number) of claim 40 , wherein the reducing is accomplished by iterating on at least one key from each of the first and second intermediate data set so that the output data set is a merging of a portion of the first and second intermediate data set .
US6768986B2
CLAIM 31
. A method as in claim 30 , further comprising the step of : (f) checking a distribution of groups on the list TOP along groups in the list ORD , using a column with a greatest number (computer system) of distinct values in each group .

US8190610B2
CLAIM 42
. The computer system (greatest number) of claim 40 , wherein the reducing is accomplished by a plurality of reducers , the d the at least one processor and memory further operable for : partitioning the first or second intermediate data set into a plurality of partitions , at least some of the key-value pairs of the first or second intermediate data being provided to more than one of the partitions ;

and providing the partitioned intermediate data set of each partition to a separate one of the reducers .
US6768986B2
CLAIM 31
. A method as in claim 30 , further comprising the step of : (f) checking a distribution of groups on the list TOP along groups in the list ORD , using a column with a greatest number (computer system) of distinct values in each group .

US8190610B2
CLAIM 43
. The computer system (greatest number) of claim 42 , wherein all of the key-value pairs of the first and second intermediate data set are provided to all of the reducers .
US6768986B2
CLAIM 31
. A method as in claim 30 , further comprising the step of : (f) checking a distribution of groups on the list TOP along groups in the list ORD , using a column with a greatest number (computer system) of distinct values in each group .

US8190610B2
CLAIM 44
. The computer system (greatest number) of claim 42 , wherein at least some of the reducers include a sort , group-by-key and combine task , the d the at least one processor and memory are further operable for : generating and providing metadata for at least some of the mapping , partitioning , combining , grouping and sorting .
US6768986B2
CLAIM 31
. A method as in claim 30 , further comprising the step of : (f) checking a distribution of groups on the list TOP along groups in the list ORD , using a column with a greatest number (computer system) of distinct values in each group .

US8190610B2
CLAIM 45
. The computer system (greatest number) of claim 44 , wherein the reducing includes processing the metadata .
US6768986B2
CLAIM 31
. A method as in claim 30 , further comprising the step of : (f) checking a distribution of groups on the list TOP along groups in the list ORD , using a column with a greatest number (computer system) of distinct values in each group .

US8190610B2
CLAIM 46
. The computer system (greatest number) of claim 45 , wherein iterating includes providing the associated metadata to the processing of the reducing step .
US6768986B2
CLAIM 31
. A method as in claim 30 , further comprising the step of : (f) checking a distribution of groups on the list TOP along groups in the list ORD , using a column with a greatest number (computer system) of distinct values in each group .




US8190610B2

Filed: 2006-10-05     Issued: 2012-05-29

MapReduce for distributed database processing

(Original Assignee) Yahoo Inc     (Current Assignee) R2 Solutions LLC ; Altaba Inc

Ali Dasdan, Hung-Chih Yang, Ruey-Lung Hsiao
EP1207464A2

Filed: 2001-02-23     Issued: 2002-05-22

Database indexing using a tree structure

(Original Assignee) Samsung Electronics Co Ltd; University of California     (Current Assignee) Samsung Electronics Co Ltd ; University of California

Yang-Lim Choi, Youngsik Huh, Bangalore S. Manjunath, Shiv Chandrasekaran
US8190610B2
CLAIM 1
. A method of processing data of a data set over a distributed system , wherein the data set comprises a plurality of data groups , the method comprising : partitioning the data of each one of the data groups into a plurality of data partitions that each have a plurality of key-value pairs and providing each data partition to a selected one of a plurality of mapping functions that are each user-configurable to independently output a plurality of lists of values for each of a set of keys found in such map function' ;

s corresponding data partition to form corresponding intermediate data for that data group and identifiable to that data group , wherein the data of a first data (said sub) group has a different schema than the data of a second data group (steps a) and the data of the first data group is mapped differently than the data of the second data group so that different lists of values are output for the corresponding different intermediate data , wherein the different schema and corresponding different intermediate data have a key in common ;

and reducing the intermediate data for the data groups to at least one output data group , including processing the intermediate data for each data group in a manner that is defined to correspond to that data group , so as to result in a merging of the corresponding different intermediate data based on the key in common , wherein the mapping and reducing operations are performed by a distributed system .
EP1207464A2
CLAIM 3
A method according to claim 2 , wherein said composite region (206 1 +206 2 +206 3 , 206 4 2 +206 4 3 +206 4 4 , 208 2 +208 3 +208 4) comprises a plurality of said sub (first data) -regions (206 1 , 206 2 , 206 3 , 206 4 2 , 206 4 3 , 206 4 4 , 208 2 , 208 3 , 208 4) .

EP1207464A2
CLAIM 12
The method of claim 8 or 9 , after the step (c) , further comprising the steps of : (d) determining whether all approximation regions are indexed as special nodes ;
(e) if all approximation regions are not indexed as special nodes , selecting the next approximation region and performing the steps a (second data group) fter (b) on the approximation region repeatedly ;
and (f) if all approximation regions are indexed as special nodes , completing the indexing .

US8190610B2
CLAIM 17
. A computer system (preceding step) including a plurality of computing devices , the computer system configured to process data of a data set , wherein the data set comprises a plurality of data groups , the computer system comprises at least one processor and memory that are operable to perform the following operations : partitioning the data of each one of the data groups into a plurality of data partitions that each have a plurality of key-value pairs and providing each data partition to a selected one of a plurality of mapping functions that are each user-configurable to independently output a plurality of lists of values for each of a set of keys found in such map function' ;

s corresponding data partition to form corresponding intermediate data for that data group and identifiable to that data group , wherein the data of a first data (said sub) group has a different schema than the data of a second data group (steps a) and the data of the first data group is mapped differently than the data of the second data group so that different lists of values are output for the corresponding different intermediate data , wherein the different schema and corresponding different intermediate data have a key in common ;

and reduce the intermediate data for the data groups to at least one output data group , including processing the intermediate data for each data group in a manner that is defined to correspond to that data group so as to result in a merging of the corresponding different intermediate data based on the key in common .
EP1207464A2
CLAIM 3
A method according to claim 2 , wherein said composite region (206 1 +206 2 +206 3 , 206 4 2 +206 4 3 +206 4 4 , 208 2 +208 3 +208 4) comprises a plurality of said sub (first data) -regions (206 1 , 206 2 , 206 3 , 206 4 2 , 206 4 3 , 206 4 4 , 208 2 , 208 3 , 208 4) .

EP1207464A2
CLAIM 6
A method of searching a database system according to claim 4 or 5 , the method comprising : identifying a terminal node (304 , 312 , 314 , 316 , 318 , 320) indexing a region (202 , 204 , 206 1 +206 2 +206 3 , 206 4 1 , 206 4 2 +206 4 3 +206 4 4 , 208 1 , 208 2 +208 3 +208 4) , a feature vector space , containing a query vector ;
determining a match criterion value on the basis of the contents of said region (202 , 204 , 206 1 +206 2 +206 3 , 206 4 1 , 206 4 2 +206 4 3 +206 4 4 , 208 1 , 208 2 +208 3 +208 4) ;
for a non-terminal node (302 , 306 , 308 , 310) , determining whether the indexed region (202 , 204 , 206 , 206 4 , 208) meets the match criterion and if not , excluding child nodes (304 , 306 , 308 , 310 , 312 , 314 , 316 , 318 , 320) of said non-terminal node (302 , 306 , 308 , 310) from the search ;
repeating the preceding step (computer system) for different non-terminal nodes (302 , 306 , 308 , 310) until all non-terminal nodes (302 , 306 , 308 , 310) have been tested for said match criterion or excluded ;
and selecting data elements within the region or regions (202 , 204 , 206 1 +206 2 +206 3 , 206 4 1 , 206 4 2 +206 4 3 +206 4 4 , 208 1 , 208 2 +208 3 +208 4) , indexed by the terminal nodes (304 , 312 , 314 , 316 , 318 , 320) that are children of remaining non-terminal nodes (302 , 306 , 308 , 310) , which meet a further match criterion with respect to the query vector .

EP1207464A2
CLAIM 12
The method of claim 8 or 9 , after the step (c) , further comprising the steps of : (d) determining whether all approximation regions are indexed as special nodes ;
(e) if all approximation regions are not indexed as special nodes , selecting the next approximation region and performing the steps a (second data group) fter (b) on the approximation region repeatedly ;
and (f) if all approximation regions are indexed as special nodes , completing the indexing .

US8190610B2
CLAIM 18
. The computer system (preceding step) of claim 17 , wherein : the at least one output data group is a plurality of output data groups .
EP1207464A2
CLAIM 6
A method of searching a database system according to claim 4 or 5 , the method comprising : identifying a terminal node (304 , 312 , 314 , 316 , 318 , 320) indexing a region (202 , 204 , 206 1 +206 2 +206 3 , 206 4 1 , 206 4 2 +206 4 3 +206 4 4 , 208 1 , 208 2 +208 3 +208 4) , a feature vector space , containing a query vector ;
determining a match criterion value on the basis of the contents of said region (202 , 204 , 206 1 +206 2 +206 3 , 206 4 1 , 206 4 2 +206 4 3 +206 4 4 , 208 1 , 208 2 +208 3 +208 4) ;
for a non-terminal node (302 , 306 , 308 , 310) , determining whether the indexed region (202 , 204 , 206 , 206 4 , 208) meets the match criterion and if not , excluding child nodes (304 , 306 , 308 , 310 , 312 , 314 , 316 , 318 , 320) of said non-terminal node (302 , 306 , 308 , 310) from the search ;
repeating the preceding step (computer system) for different non-terminal nodes (302 , 306 , 308 , 310) until all non-terminal nodes (302 , 306 , 308 , 310) have been tested for said match criterion or excluded ;
and selecting data elements within the region or regions (202 , 204 , 206 1 +206 2 +206 3 , 206 4 1 , 206 4 2 +206 4 3 +206 4 4 , 208 1 , 208 2 +208 3 +208 4) , indexed by the terminal nodes (304 , 312 , 314 , 316 , 318 , 320) that are children of remaining non-terminal nodes (302 , 306 , 308 , 310) , which meet a further match criterion with respect to the query vector .

US8190610B2
CLAIM 19
. The computer system (preceding step) of claim 17 , wherein : corresponding intermediate data for a data group being identifiable to the data group includes the corresponding intermediate data being stored such that metadata associated with the manner of storage is identifiable to the data group .
EP1207464A2
CLAIM 6
A method of searching a database system according to claim 4 or 5 , the method comprising : identifying a terminal node (304 , 312 , 314 , 316 , 318 , 320) indexing a region (202 , 204 , 206 1 +206 2 +206 3 , 206 4 1 , 206 4 2 +206 4 3 +206 4 4 , 208 1 , 208 2 +208 3 +208 4) , a feature vector space , containing a query vector ;
determining a match criterion value on the basis of the contents of said region (202 , 204 , 206 1 +206 2 +206 3 , 206 4 1 , 206 4 2 +206 4 3 +206 4 4 , 208 1 , 208 2 +208 3 +208 4) ;
for a non-terminal node (302 , 306 , 308 , 310) , determining whether the indexed region (202 , 204 , 206 , 206 4 , 208) meets the match criterion and if not , excluding child nodes (304 , 306 , 308 , 310 , 312 , 314 , 316 , 318 , 320) of said non-terminal node (302 , 306 , 308 , 310) from the search ;
repeating the preceding step (computer system) for different non-terminal nodes (302 , 306 , 308 , 310) until all non-terminal nodes (302 , 306 , 308 , 310) have been tested for said match criterion or excluded ;
and selecting data elements within the region or regions (202 , 204 , 206 1 +206 2 +206 3 , 206 4 1 , 206 4 2 +206 4 3 +206 4 4 , 208 1 , 208 2 +208 3 +208 4) , indexed by the terminal nodes (304 , 312 , 314 , 316 , 318 , 320) that are children of remaining non-terminal nodes (302 , 306 , 308 , 310) , which meet a further match criterion with respect to the query vector .

US8190610B2
CLAIM 20
. The computer system (preceding step) of claim 19 , wherein : corresponding intermediate data for a data group being identifiable to the data group includes the computer system being configured to store the corresponding intermediate data in association with data that is identifiable to the data group .
EP1207464A2
CLAIM 6
A method of searching a database system according to claim 4 or 5 , the method comprising : identifying a terminal node (304 , 312 , 314 , 316 , 318 , 320) indexing a region (202 , 204 , 206 1 +206 2 +206 3 , 206 4 1 , 206 4 2 +206 4 3 +206 4 4 , 208 1 , 208 2 +208 3 +208 4) , a feature vector space , containing a query vector ;
determining a match criterion value on the basis of the contents of said region (202 , 204 , 206 1 +206 2 +206 3 , 206 4 1 , 206 4 2 +206 4 3 +206 4 4 , 208 1 , 208 2 +208 3 +208 4) ;
for a non-terminal node (302 , 306 , 308 , 310) , determining whether the indexed region (202 , 204 , 206 , 206 4 , 208) meets the match criterion and if not , excluding child nodes (304 , 306 , 308 , 310 , 312 , 314 , 316 , 318 , 320) of said non-terminal node (302 , 306 , 308 , 310) from the search ;
repeating the preceding step (computer system) for different non-terminal nodes (302 , 306 , 308 , 310) until all non-terminal nodes (302 , 306 , 308 , 310) have been tested for said match criterion or excluded ;
and selecting data elements within the region or regions (202 , 204 , 206 1 +206 2 +206 3 , 206 4 1 , 206 4 2 +206 4 3 +206 4 4 , 208 1 , 208 2 +208 3 +208 4) , indexed by the terminal nodes (304 , 312 , 314 , 316 , 318 , 320) that are children of remaining non-terminal nodes (302 , 306 , 308 , 310) , which meet a further match criterion with respect to the query vector .

US8190610B2
CLAIM 21
. The computer system (preceding step) of claim 17 , wherein : the at least one processor and memory being configured to process the intermediate data for each data group in a manner that is defined to correspond to that data group includes , for each data group , the at least one processor and memory being configured to employ an iterator that corresponds to that data group .
EP1207464A2
CLAIM 6
A method of searching a database system according to claim 4 or 5 , the method comprising : identifying a terminal node (304 , 312 , 314 , 316 , 318 , 320) indexing a region (202 , 204 , 206 1 +206 2 +206 3 , 206 4 1 , 206 4 2 +206 4 3 +206 4 4 , 208 1 , 208 2 +208 3 +208 4) , a feature vector space , containing a query vector ;
determining a match criterion value on the basis of the contents of said region (202 , 204 , 206 1 +206 2 +206 3 , 206 4 1 , 206 4 2 +206 4 3 +206 4 4 , 208 1 , 208 2 +208 3 +208 4) ;
for a non-terminal node (302 , 306 , 308 , 310) , determining whether the indexed region (202 , 204 , 206 , 206 4 , 208) meets the match criterion and if not , excluding child nodes (304 , 306 , 308 , 310 , 312 , 314 , 316 , 318 , 320) of said non-terminal node (302 , 306 , 308 , 310) from the search ;
repeating the preceding step (computer system) for different non-terminal nodes (302 , 306 , 308 , 310) until all non-terminal nodes (302 , 306 , 308 , 310) have been tested for said match criterion or excluded ;
and selecting data elements within the region or regions (202 , 204 , 206 1 +206 2 +206 3 , 206 4 1 , 206 4 2 +206 4 3 +206 4 4 , 208 1 , 208 2 +208 3 +208 4) , indexed by the terminal nodes (304 , 312 , 314 , 316 , 318 , 320) that are children of remaining non-terminal nodes (302 , 306 , 308 , 310) , which meet a further match criterion with respect to the query vector .

US8190610B2
CLAIM 22
. The computer system (preceding step) of claim 21 , wherein : the intermediate data includes a plurality of grouped sets of key/value pairs ;

the at least one processor and memory are further operable to reduce the intermediate data via a plurality of reducers ;

for at least one of the reducers , the iterator corresponding to a particular data group , for that reducer , is configured to operate according to a different key of a different schema than the iterator corresponding to another particular data group , for that reducer .
EP1207464A2
CLAIM 6
A method of searching a database system according to claim 4 or 5 , the method comprising : identifying a terminal node (304 , 312 , 314 , 316 , 318 , 320) indexing a region (202 , 204 , 206 1 +206 2 +206 3 , 206 4 1 , 206 4 2 +206 4 3 +206 4 4 , 208 1 , 208 2 +208 3 +208 4) , a feature vector space , containing a query vector ;
determining a match criterion value on the basis of the contents of said region (202 , 204 , 206 1 +206 2 +206 3 , 206 4 1 , 206 4 2 +206 4 3 +206 4 4 , 208 1 , 208 2 +208 3 +208 4) ;
for a non-terminal node (302 , 306 , 308 , 310) , determining whether the indexed region (202 , 204 , 206 , 206 4 , 208) meets the match criterion and if not , excluding child nodes (304 , 306 , 308 , 310 , 312 , 314 , 316 , 318 , 320) of said non-terminal node (302 , 306 , 308 , 310) from the search ;
repeating the preceding step (computer system) for different non-terminal nodes (302 , 306 , 308 , 310) until all non-terminal nodes (302 , 306 , 308 , 310) have been tested for said match criterion or excluded ;
and selecting data elements within the region or regions (202 , 204 , 206 1 +206 2 +206 3 , 206 4 1 , 206 4 2 +206 4 3 +206 4 4 , 208 1 , 208 2 +208 3 +208 4) , indexed by the terminal nodes (304 , 312 , 314 , 316 , 318 , 320) that are children of remaining non-terminal nodes (302 , 306 , 308 , 310) , which meet a further match criterion with respect to the query vector .

US8190610B2
CLAIM 23
. The computer system (preceding step) of claim 17 , wherein : the intermediate data includes a plurality of grouped sets of key/value pairs ;

the at least one processor and memory are further operable to reduce the intermediate data via a plurality of reducers ;

and the at least one processor and memory are further operable to : partition the intermediate data into a plurality of partitions , to provide each key/value pair of the intermediate data to a separate one of the partitions ;

and provide the intermediate data of each partition to a separate one of the reducers .
EP1207464A2
CLAIM 6
A method of searching a database system according to claim 4 or 5 , the method comprising : identifying a terminal node (304 , 312 , 314 , 316 , 318 , 320) indexing a region (202 , 204 , 206 1 +206 2 +206 3 , 206 4 1 , 206 4 2 +206 4 3 +206 4 4 , 208 1 , 208 2 +208 3 +208 4) , a feature vector space , containing a query vector ;
determining a match criterion value on the basis of the contents of said region (202 , 204 , 206 1 +206 2 +206 3 , 206 4 1 , 206 4 2 +206 4 3 +206 4 4 , 208 1 , 208 2 +208 3 +208 4) ;
for a non-terminal node (302 , 306 , 308 , 310) , determining whether the indexed region (202 , 204 , 206 , 206 4 , 208) meets the match criterion and if not , excluding child nodes (304 , 306 , 308 , 310 , 312 , 314 , 316 , 318 , 320) of said non-terminal node (302 , 306 , 308 , 310) from the search ;
repeating the preceding step (computer system) for different non-terminal nodes (302 , 306 , 308 , 310) until all non-terminal nodes (302 , 306 , 308 , 310) have been tested for said match criterion or excluded ;
and selecting data elements within the region or regions (202 , 204 , 206 1 +206 2 +206 3 , 206 4 1 , 206 4 2 +206 4 3 +206 4 4 , 208 1 , 208 2 +208 3 +208 4) , indexed by the terminal nodes (304 , 312 , 314 , 316 , 318 , 320) that are children of remaining non-terminal nodes (302 , 306 , 308 , 310) , which meet a further match criterion with respect to the query vector .

US8190610B2
CLAIM 24
. The computer system (preceding step) of claim 17 , wherein : the intermediate data includes a plurality of grouped sets of key/value pairs ;

the at least one processor and memory are further operable to reduce the intermediate data via a plurality of reducers ;

and the at least one processor and memory are further operable to : partition the intermediate data into a plurality of partitions , to provide at least some of the key/value pairs of the intermediate data to more than one of the partitions ;

and provide the intermediate data of each partition to a separate one of the reducers .
EP1207464A2
CLAIM 6
A method of searching a database system according to claim 4 or 5 , the method comprising : identifying a terminal node (304 , 312 , 314 , 316 , 318 , 320) indexing a region (202 , 204 , 206 1 +206 2 +206 3 , 206 4 1 , 206 4 2 +206 4 3 +206 4 4 , 208 1 , 208 2 +208 3 +208 4) , a feature vector space , containing a query vector ;
determining a match criterion value on the basis of the contents of said region (202 , 204 , 206 1 +206 2 +206 3 , 206 4 1 , 206 4 2 +206 4 3 +206 4 4 , 208 1 , 208 2 +208 3 +208 4) ;
for a non-terminal node (302 , 306 , 308 , 310) , determining whether the indexed region (202 , 204 , 206 , 206 4 , 208) meets the match criterion and if not , excluding child nodes (304 , 306 , 308 , 310 , 312 , 314 , 316 , 318 , 320) of said non-terminal node (302 , 306 , 308 , 310) from the search ;
repeating the preceding step (computer system) for different non-terminal nodes (302 , 306 , 308 , 310) until all non-terminal nodes (302 , 306 , 308 , 310) have been tested for said match criterion or excluded ;
and selecting data elements within the region or regions (202 , 204 , 206 1 +206 2 +206 3 , 206 4 1 , 206 4 2 +206 4 3 +206 4 4 , 208 1 , 208 2 +208 3 +208 4) , indexed by the terminal nodes (304 , 312 , 314 , 316 , 318 , 320) that are children of remaining non-terminal nodes (302 , 306 , 308 , 310) , which meet a further match criterion with respect to the query vector .

US8190610B2
CLAIM 25
. The computer system (preceding step) of claim 24 , wherein : the at least one processor and memory are further operable to , in partitioning , to provide all of the key/value pairs of the intermediate data to all of the partitions .
EP1207464A2
CLAIM 6
A method of searching a database system according to claim 4 or 5 , the method comprising : identifying a terminal node (304 , 312 , 314 , 316 , 318 , 320) indexing a region (202 , 204 , 206 1 +206 2 +206 3 , 206 4 1 , 206 4 2 +206 4 3 +206 4 4 , 208 1 , 208 2 +208 3 +208 4) , a feature vector space , containing a query vector ;
determining a match criterion value on the basis of the contents of said region (202 , 204 , 206 1 +206 2 +206 3 , 206 4 1 , 206 4 2 +206 4 3 +206 4 4 , 208 1 , 208 2 +208 3 +208 4) ;
for a non-terminal node (302 , 306 , 308 , 310) , determining whether the indexed region (202 , 204 , 206 , 206 4 , 208) meets the match criterion and if not , excluding child nodes (304 , 306 , 308 , 310 , 312 , 314 , 316 , 318 , 320) of said non-terminal node (302 , 306 , 308 , 310) from the search ;
repeating the preceding step (computer system) for different non-terminal nodes (302 , 306 , 308 , 310) until all non-terminal nodes (302 , 306 , 308 , 310) have been tested for said match criterion or excluded ;
and selecting data elements within the region or regions (202 , 204 , 206 1 +206 2 +206 3 , 206 4 1 , 206 4 2 +206 4 3 +206 4 4 , 208 1 , 208 2 +208 3 +208 4) , indexed by the terminal nodes (304 , 312 , 314 , 316 , 318 , 320) that are children of remaining non-terminal nodes (302 , 306 , 308 , 310) , which meet a further match criterion with respect to the query vector .

US8190610B2
CLAIM 26
. The computer system (preceding step) of claim 24 , wherein : at least some of the reducers include a sort , group-by-key and combine task ;

the at least one processor and memory are further operable to generate and provide metadata for at least some of the mapping , partitioning , combining , grouping and sorting .
EP1207464A2
CLAIM 6
A method of searching a database system according to claim 4 or 5 , the method comprising : identifying a terminal node (304 , 312 , 314 , 316 , 318 , 320) indexing a region (202 , 204 , 206 1 +206 2 +206 3 , 206 4 1 , 206 4 2 +206 4 3 +206 4 4 , 208 1 , 208 2 +208 3 +208 4) , a feature vector space , containing a query vector ;
determining a match criterion value on the basis of the contents of said region (202 , 204 , 206 1 +206 2 +206 3 , 206 4 1 , 206 4 2 +206 4 3 +206 4 4 , 208 1 , 208 2 +208 3 +208 4) ;
for a non-terminal node (302 , 306 , 308 , 310) , determining whether the indexed region (202 , 204 , 206 , 206 4 , 208) meets the match criterion and if not , excluding child nodes (304 , 306 , 308 , 310 , 312 , 314 , 316 , 318 , 320) of said non-terminal node (302 , 306 , 308 , 310) from the search ;
repeating the preceding step (computer system) for different non-terminal nodes (302 , 306 , 308 , 310) until all non-terminal nodes (302 , 306 , 308 , 310) have been tested for said match criterion or excluded ;
and selecting data elements within the region or regions (202 , 204 , 206 1 +206 2 +206 3 , 206 4 1 , 206 4 2 +206 4 3 +206 4 4 , 208 1 , 208 2 +208 3 +208 4) , indexed by the terminal nodes (304 , 312 , 314 , 316 , 318 , 320) that are children of remaining non-terminal nodes (302 , 306 , 308 , 310) , which meet a further match criterion with respect to the query vector .

US8190610B2
CLAIM 27
. The computer system (preceding step) of claim 26 , wherein : the reducing includes processing the metadata .
EP1207464A2
CLAIM 6
A method of searching a database system according to claim 4 or 5 , the method comprising : identifying a terminal node (304 , 312 , 314 , 316 , 318 , 320) indexing a region (202 , 204 , 206 1 +206 2 +206 3 , 206 4 1 , 206 4 2 +206 4 3 +206 4 4 , 208 1 , 208 2 +208 3 +208 4) , a feature vector space , containing a query vector ;
determining a match criterion value on the basis of the contents of said region (202 , 204 , 206 1 +206 2 +206 3 , 206 4 1 , 206 4 2 +206 4 3 +206 4 4 , 208 1 , 208 2 +208 3 +208 4) ;
for a non-terminal node (302 , 306 , 308 , 310) , determining whether the indexed region (202 , 204 , 206 , 206 4 , 208) meets the match criterion and if not , excluding child nodes (304 , 306 , 308 , 310 , 312 , 314 , 316 , 318 , 320) of said non-terminal node (302 , 306 , 308 , 310) from the search ;
repeating the preceding step (computer system) for different non-terminal nodes (302 , 306 , 308 , 310) until all non-terminal nodes (302 , 306 , 308 , 310) have been tested for said match criterion or excluded ;
and selecting data elements within the region or regions (202 , 204 , 206 1 +206 2 +206 3 , 206 4 1 , 206 4 2 +206 4 3 +206 4 4 , 208 1 , 208 2 +208 3 +208 4) , indexed by the terminal nodes (304 , 312 , 314 , 316 , 318 , 320) that are children of remaining non-terminal nodes (302 , 306 , 308 , 310) , which meet a further match criterion with respect to the query vector .

US8190610B2
CLAIM 28
. The computer system (preceding step) of claim 27 , wherein : the at least one processor and memory being further operable to process the intermediate data for each data group in a manner that is defined to correspond to that data group includes , for each data group , the at least one processor and memory being further operable to employ an iterator that corresponds to that data group , wherein the iterator is configured to provide the associated metadata to the reducing .
EP1207464A2
CLAIM 6
A method of searching a database system according to claim 4 or 5 , the method comprising : identifying a terminal node (304 , 312 , 314 , 316 , 318 , 320) indexing a region (202 , 204 , 206 1 +206 2 +206 3 , 206 4 1 , 206 4 2 +206 4 3 +206 4 4 , 208 1 , 208 2 +208 3 +208 4) , a feature vector space , containing a query vector ;
determining a match criterion value on the basis of the contents of said region (202 , 204 , 206 1 +206 2 +206 3 , 206 4 1 , 206 4 2 +206 4 3 +206 4 4 , 208 1 , 208 2 +208 3 +208 4) ;
for a non-terminal node (302 , 306 , 308 , 310) , determining whether the indexed region (202 , 204 , 206 , 206 4 , 208) meets the match criterion and if not , excluding child nodes (304 , 306 , 308 , 310 , 312 , 314 , 316 , 318 , 320) of said non-terminal node (302 , 306 , 308 , 310) from the search ;
repeating the preceding step (computer system) for different non-terminal nodes (302 , 306 , 308 , 310) until all non-terminal nodes (302 , 306 , 308 , 310) have been tested for said match criterion or excluded ;
and selecting data elements within the region or regions (202 , 204 , 206 1 +206 2 +206 3 , 206 4 1 , 206 4 2 +206 4 3 +206 4 4 , 208 1 , 208 2 +208 3 +208 4) , indexed by the terminal nodes (304 , 312 , 314 , 316 , 318 , 320) that are children of remaining non-terminal nodes (302 , 306 , 308 , 310) , which meet a further match criterion with respect to the query vector .

US8190610B2
CLAIM 29
. The computer system (preceding step) of claim 17 , wherein : the intermediate data processing of the reducing further comprises processing data that is not intermediate data .
EP1207464A2
CLAIM 6
A method of searching a database system according to claim 4 or 5 , the method comprising : identifying a terminal node (304 , 312 , 314 , 316 , 318 , 320) indexing a region (202 , 204 , 206 1 +206 2 +206 3 , 206 4 1 , 206 4 2 +206 4 3 +206 4 4 , 208 1 , 208 2 +208 3 +208 4) , a feature vector space , containing a query vector ;
determining a match criterion value on the basis of the contents of said region (202 , 204 , 206 1 +206 2 +206 3 , 206 4 1 , 206 4 2 +206 4 3 +206 4 4 , 208 1 , 208 2 +208 3 +208 4) ;
for a non-terminal node (302 , 306 , 308 , 310) , determining whether the indexed region (202 , 204 , 206 , 206 4 , 208) meets the match criterion and if not , excluding child nodes (304 , 306 , 308 , 310 , 312 , 314 , 316 , 318 , 320) of said non-terminal node (302 , 306 , 308 , 310) from the search ;
repeating the preceding step (computer system) for different non-terminal nodes (302 , 306 , 308 , 310) until all non-terminal nodes (302 , 306 , 308 , 310) have been tested for said match criterion or excluded ;
and selecting data elements within the region or regions (202 , 204 , 206 1 +206 2 +206 3 , 206 4 1 , 206 4 2 +206 4 3 +206 4 4 , 208 1 , 208 2 +208 3 +208 4) , indexed by the terminal nodes (304 , 312 , 314 , 316 , 318 , 320) that are children of remaining non-terminal nodes (302 , 306 , 308 , 310) , which meet a further match criterion with respect to the query vector .

US8190610B2
CLAIM 30
. The computer system (preceding step) of claim 29 , wherein : the at least one processor and memory are further operable to reduce the intermediate data via a plurality of reducers ;

and the data that is not intermediate data , for a particular reducer , includes data that is associated with another reducer .
EP1207464A2
CLAIM 6
A method of searching a database system according to claim 4 or 5 , the method comprising : identifying a terminal node (304 , 312 , 314 , 316 , 318 , 320) indexing a region (202 , 204 , 206 1 +206 2 +206 3 , 206 4 1 , 206 4 2 +206 4 3 +206 4 4 , 208 1 , 208 2 +208 3 +208 4) , a feature vector space , containing a query vector ;
determining a match criterion value on the basis of the contents of said region (202 , 204 , 206 1 +206 2 +206 3 , 206 4 1 , 206 4 2 +206 4 3 +206 4 4 , 208 1 , 208 2 +208 3 +208 4) ;
for a non-terminal node (302 , 306 , 308 , 310) , determining whether the indexed region (202 , 204 , 206 , 206 4 , 208) meets the match criterion and if not , excluding child nodes (304 , 306 , 308 , 310 , 312 , 314 , 316 , 318 , 320) of said non-terminal node (302 , 306 , 308 , 310) from the search ;
repeating the preceding step (computer system) for different non-terminal nodes (302 , 306 , 308 , 310) until all non-terminal nodes (302 , 306 , 308 , 310) have been tested for said match criterion or excluded ;
and selecting data elements within the region or regions (202 , 204 , 206 1 +206 2 +206 3 , 206 4 1 , 206 4 2 +206 4 3 +206 4 4 , 208 1 , 208 2 +208 3 +208 4) , indexed by the terminal nodes (304 , 312 , 314 , 316 , 318 , 320) that are children of remaining non-terminal nodes (302 , 306 , 308 , 310) , which meet a further match criterion with respect to the query vector .

US8190610B2
CLAIM 31
. The computer system (preceding step) of claim 29 , wherein : the at least one processor and memory are further operable to reduce the intermediate data via a plurality of reducers ;

and the data that is not intermediate data , for a particular reducer , includes data that is associated with that reducer .
EP1207464A2
CLAIM 6
A method of searching a database system according to claim 4 or 5 , the method comprising : identifying a terminal node (304 , 312 , 314 , 316 , 318 , 320) indexing a region (202 , 204 , 206 1 +206 2 +206 3 , 206 4 1 , 206 4 2 +206 4 3 +206 4 4 , 208 1 , 208 2 +208 3 +208 4) , a feature vector space , containing a query vector ;
determining a match criterion value on the basis of the contents of said region (202 , 204 , 206 1 +206 2 +206 3 , 206 4 1 , 206 4 2 +206 4 3 +206 4 4 , 208 1 , 208 2 +208 3 +208 4) ;
for a non-terminal node (302 , 306 , 308 , 310) , determining whether the indexed region (202 , 204 , 206 , 206 4 , 208) meets the match criterion and if not , excluding child nodes (304 , 306 , 308 , 310 , 312 , 314 , 316 , 318 , 320) of said non-terminal node (302 , 306 , 308 , 310) from the search ;
repeating the preceding step (computer system) for different non-terminal nodes (302 , 306 , 308 , 310) until all non-terminal nodes (302 , 306 , 308 , 310) have been tested for said match criterion or excluded ;
and selecting data elements within the region or regions (202 , 204 , 206 1 +206 2 +206 3 , 206 4 1 , 206 4 2 +206 4 3 +206 4 4 , 208 1 , 208 2 +208 3 +208 4) , indexed by the terminal nodes (304 , 312 , 314 , 316 , 318 , 320) that are children of remaining non-terminal nodes (302 , 306 , 308 , 310) , which meet a further match criterion with respect to the query vector .

US8190610B2
CLAIM 32
. The computer system (preceding step) of claim 17 , wherein the at least one processor and memory are further operable to reduce by relating the data among the plurality of data groups .
EP1207464A2
CLAIM 6
A method of searching a database system according to claim 4 or 5 , the method comprising : identifying a terminal node (304 , 312 , 314 , 316 , 318 , 320) indexing a region (202 , 204 , 206 1 +206 2 +206 3 , 206 4 1 , 206 4 2 +206 4 3 +206 4 4 , 208 1 , 208 2 +208 3 +208 4) , a feature vector space , containing a query vector ;
determining a match criterion value on the basis of the contents of said region (202 , 204 , 206 1 +206 2 +206 3 , 206 4 1 , 206 4 2 +206 4 3 +206 4 4 , 208 1 , 208 2 +208 3 +208 4) ;
for a non-terminal node (302 , 306 , 308 , 310) , determining whether the indexed region (202 , 204 , 206 , 206 4 , 208) meets the match criterion and if not , excluding child nodes (304 , 306 , 308 , 310 , 312 , 314 , 316 , 318 , 320) of said non-terminal node (302 , 306 , 308 , 310) from the search ;
repeating the preceding step (computer system) for different non-terminal nodes (302 , 306 , 308 , 310) until all non-terminal nodes (302 , 306 , 308 , 310) have been tested for said match criterion or excluded ;
and selecting data elements within the region or regions (202 , 204 , 206 1 +206 2 +206 3 , 206 4 1 , 206 4 2 +206 4 3 +206 4 4 , 208 1 , 208 2 +208 3 +208 4) , indexed by the terminal nodes (304 , 312 , 314 , 316 , 318 , 320) that are children of remaining non-terminal nodes (302 , 306 , 308 , 310) , which meet a further match criterion with respect to the query vector .

US8190610B2
CLAIM 33
. A map-reduce method of processing data from a plurality of groups having different schema over a computer system (preceding step) , the method comprising : for a first data (said sub) set having a plurality of first key-value pairs , wherein such first data set belongs to a first data group and the first key-value pairs have a first schema , partitioning the first data set into a plurality of data partitions and providing each data partition to a selected one of a plurality of mapping functions that are each user-configurable to independently output a plurality of lists of values for each of a set of keys found in such map function' ;

s corresponding data partition to form a first intermediate data set having a first set of resulting key-value pairs ;

for a second data set having a plurality of second key-value pairs , wherein such second data set belongs to a second data group (steps a) and the second key-value pairs have a second schema (data elements) , partitioning the first data set into a plurality of data partitions and providing each data partition to a selected one of a plurality of mapping functions to form a second intermediate data set having a second set of resulting key-value pairs differing from the first set of resulting key-value pairs , wherein the first schema differs from the second schema , wherein the first and second schema and the first and second set of resulting key-value pairs have a key in common ;

and reducing the first and the second intermediate data set together so as to form an output data set (n value) , wherein the reducing is accomplished by iterating on at least one key from the first and second intermediate data set , and the output data set has a different schema than the first and second schema , wherein the mapping and reducing operations are performed by a distributed system .
EP1207464A2
CLAIM 3
A method according to claim 2 , wherein said composite region (206 1 +206 2 +206 3 , 206 4 2 +206 4 3 +206 4 4 , 208 2 +208 3 +208 4) comprises a plurality of said sub (first data) -regions (206 1 , 206 2 , 206 3 , 206 4 2 , 206 4 3 , 206 4 4 , 208 2 , 208 3 , 208 4) .

EP1207464A2
CLAIM 4
A database system including memory means storing data elements (second schema) and memory means storing a feature vector index for said data elements , wherein said feature vector index comprises a plurality of indexing data elements (302 , 304 , 306 , 308 , 310 , 312 , 314 , 316 , 318 , 320) in the form of a tree , each node (302 , 304 , 306 , 308 , 310 , 312 , 314 , 316 , 318 , 320) of which indexes a region (202 , 204 , 206 , 208 , 206 1 +206 2 +206 3 , 206 4 2 +206 4 3 +206 4 4 , 206 4 1 , 208 1 , 208 2 +208 3 +208 4) of a feature vector space , characterised in that a terminal node (304 , 312 , 314 , 316 , 318 , 320) of the tree is an index for a composite region (206 1 +206 2 +206 3 , 206 4 2 +206 4 3 +206 4 4 , 208 2 +208 3 +208 4) consisting of a plurality of regions (206 1 , 206 2 , 206 3 , 206 4 2 , 206 4 3 , 206 4 4 , 208 2 , 208 3 , 208 4) each having a population meeting a predetermined distribution criterion and being separated by no more than a predetermined distance in said feature vector space .

EP1207464A2
CLAIM 6
A method of searching a database system according to claim 4 or 5 , the method comprising : identifying a terminal node (304 , 312 , 314 , 316 , 318 , 320) indexing a region (202 , 204 , 206 1 +206 2 +206 3 , 206 4 1 , 206 4 2 +206 4 3 +206 4 4 , 208 1 , 208 2 +208 3 +208 4) , a feature vector space , containing a query vector ;
determining a match criterion value (output data set) on the basis of the contents of said region (202 , 204 , 206 1 +206 2 +206 3 , 206 4 1 , 206 4 2 +206 4 3 +206 4 4 , 208 1 , 208 2 +208 3 +208 4) ;
for a non-terminal node (302 , 306 , 308 , 310) , determining whether the indexed region (202 , 204 , 206 , 206 4 , 208) meets the match criterion and if not , excluding child nodes (304 , 306 , 308 , 310 , 312 , 314 , 316 , 318 , 320) of said non-terminal node (302 , 306 , 308 , 310) from the search ;
repeating the preceding step (computer system) for different non-terminal nodes (302 , 306 , 308 , 310) until all non-terminal nodes (302 , 306 , 308 , 310) have been tested for said match criterion or excluded ;
and selecting data elements within the region or regions (202 , 204 , 206 1 +206 2 +206 3 , 206 4 1 , 206 4 2 +206 4 3 +206 4 4 , 208 1 , 208 2 +208 3 +208 4) , indexed by the terminal nodes (304 , 312 , 314 , 316 , 318 , 320) that are children of remaining non-terminal nodes (302 , 306 , 308 , 310) , which meet a further match criterion with respect to the query vector .

EP1207464A2
CLAIM 12
The method of claim 8 or 9 , after the step (c) , further comprising the steps of : (d) determining whether all approximation regions are indexed as special nodes ;
(e) if all approximation regions are not indexed as special nodes , selecting the next approximation region and performing the steps a (second data group) fter (b) on the approximation region repeatedly ;
and (f) if all approximation regions are indexed as special nodes , completing the indexing .

US8190610B2
CLAIM 34
. The map-reduce method of claim 33 , wherein the reducing is accomplished by iterating on at least one key from each of the first and second intermediate data set so that the output data set (n value) is a merging of a portion of the first and second intermediate data set .
EP1207464A2
CLAIM 6
A method of searching a database system according to claim 4 or 5 , the method comprising : identifying a terminal node (304 , 312 , 314 , 316 , 318 , 320) indexing a region (202 , 204 , 206 1 +206 2 +206 3 , 206 4 1 , 206 4 2 +206 4 3 +206 4 4 , 208 1 , 208 2 +208 3 +208 4) , a feature vector space , containing a query vector ;
determining a match criterion value (output data set) on the basis of the contents of said region (202 , 204 , 206 1 +206 2 +206 3 , 206 4 1 , 206 4 2 +206 4 3 +206 4 4 , 208 1 , 208 2 +208 3 +208 4) ;
for a non-terminal node (302 , 306 , 308 , 310) , determining whether the indexed region (202 , 204 , 206 , 206 4 , 208) meets the match criterion and if not , excluding child nodes (304 , 306 , 308 , 310 , 312 , 314 , 316 , 318 , 320) of said non-terminal node (302 , 306 , 308 , 310) from the search ;
repeating the preceding step for different non-terminal nodes (302 , 306 , 308 , 310) until all non-terminal nodes (302 , 306 , 308 , 310) have been tested for said match criterion or excluded ;
and selecting data elements within the region or regions (202 , 204 , 206 1 +206 2 +206 3 , 206 4 1 , 206 4 2 +206 4 3 +206 4 4 , 208 1 , 208 2 +208 3 +208 4) , indexed by the terminal nodes (304 , 312 , 314 , 316 , 318 , 320) that are children of remaining non-terminal nodes (302 , 306 , 308 , 310) , which meet a further match criterion with respect to the query vector .

US8190610B2
CLAIM 40
. A computer system (preceding step) including a plurality of computing devices , the computer system configured to process data of a data set , wherein the data set comprises a plurality of data groups , the computer system comprises at least one processor and memory that are operable to perform the following operations : for a first data (said sub) set having a plurality of first key-value pairs , wherein such first data set belongs to a first data group and the first key-value pairs have a first schema , partitioning the first data set into a plurality of data partitions and providing each data partition to a selected one of a plurality of mapping functions that are each user-configurable to independently output a plurality of lists of values for each of a set of keys found in such map function' ;

s corresponding data partition to form a first intermediate data set having a first set of resulting key-value pairs ;

for a second data set having a plurality of second key-value pairs , wherein such second data set belongs to a second data group (steps a) and the second key-value pairs have a second schema (data elements) , partitioning the first data set into a plurality of data partitions and providing each data partition to a selected one of a plurality of mapping functions to form a second intermediate data set having a second set of resulting key-value pairs differing from the first set of resulting key-value pairs , wherein the first schema differs from the second schema ;

and reducing the first and the second intermediate data set together so as to form an output data set (n value) , wherein the reducing is accomplished by iterating on at least one key from the first and second intermediate data set , and the output data set has a different schema than the first and second schema , wherein the first and second schema and the first and second set of resulting key-value pairs have a key in common .
EP1207464A2
CLAIM 3
A method according to claim 2 , wherein said composite region (206 1 +206 2 +206 3 , 206 4 2 +206 4 3 +206 4 4 , 208 2 +208 3 +208 4) comprises a plurality of said sub (first data) -regions (206 1 , 206 2 , 206 3 , 206 4 2 , 206 4 3 , 206 4 4 , 208 2 , 208 3 , 208 4) .

EP1207464A2
CLAIM 4
A database system including memory means storing data elements (second schema) and memory means storing a feature vector index for said data elements , wherein said feature vector index comprises a plurality of indexing data elements (302 , 304 , 306 , 308 , 310 , 312 , 314 , 316 , 318 , 320) in the form of a tree , each node (302 , 304 , 306 , 308 , 310 , 312 , 314 , 316 , 318 , 320) of which indexes a region (202 , 204 , 206 , 208 , 206 1 +206 2 +206 3 , 206 4 2 +206 4 3 +206 4 4 , 206 4 1 , 208 1 , 208 2 +208 3 +208 4) of a feature vector space , characterised in that a terminal node (304 , 312 , 314 , 316 , 318 , 320) of the tree is an index for a composite region (206 1 +206 2 +206 3 , 206 4 2 +206 4 3 +206 4 4 , 208 2 +208 3 +208 4) consisting of a plurality of regions (206 1 , 206 2 , 206 3 , 206 4 2 , 206 4 3 , 206 4 4 , 208 2 , 208 3 , 208 4) each having a population meeting a predetermined distribution criterion and being separated by no more than a predetermined distance in said feature vector space .

EP1207464A2
CLAIM 6
A method of searching a database system according to claim 4 or 5 , the method comprising : identifying a terminal node (304 , 312 , 314 , 316 , 318 , 320) indexing a region (202 , 204 , 206 1 +206 2 +206 3 , 206 4 1 , 206 4 2 +206 4 3 +206 4 4 , 208 1 , 208 2 +208 3 +208 4) , a feature vector space , containing a query vector ;
determining a match criterion value (output data set) on the basis of the contents of said region (202 , 204 , 206 1 +206 2 +206 3 , 206 4 1 , 206 4 2 +206 4 3 +206 4 4 , 208 1 , 208 2 +208 3 +208 4) ;
for a non-terminal node (302 , 306 , 308 , 310) , determining whether the indexed region (202 , 204 , 206 , 206 4 , 208) meets the match criterion and if not , excluding child nodes (304 , 306 , 308 , 310 , 312 , 314 , 316 , 318 , 320) of said non-terminal node (302 , 306 , 308 , 310) from the search ;
repeating the preceding step (computer system) for different non-terminal nodes (302 , 306 , 308 , 310) until all non-terminal nodes (302 , 306 , 308 , 310) have been tested for said match criterion or excluded ;
and selecting data elements within the region or regions (202 , 204 , 206 1 +206 2 +206 3 , 206 4 1 , 206 4 2 +206 4 3 +206 4 4 , 208 1 , 208 2 +208 3 +208 4) , indexed by the terminal nodes (304 , 312 , 314 , 316 , 318 , 320) that are children of remaining non-terminal nodes (302 , 306 , 308 , 310) , which meet a further match criterion with respect to the query vector .

EP1207464A2
CLAIM 12
The method of claim 8 or 9 , after the step (c) , further comprising the steps of : (d) determining whether all approximation regions are indexed as special nodes ;
(e) if all approximation regions are not indexed as special nodes , selecting the next approximation region and performing the steps a (second data group) fter (b) on the approximation region repeatedly ;
and (f) if all approximation regions are indexed as special nodes , completing the indexing .

US8190610B2
CLAIM 41
. The computer system (preceding step) of claim 40 , wherein the reducing is accomplished by iterating on at least one key from each of the first and second intermediate data set so that the output data set (n value) is a merging of a portion of the first and second intermediate data set .
EP1207464A2
CLAIM 6
A method of searching a database system according to claim 4 or 5 , the method comprising : identifying a terminal node (304 , 312 , 314 , 316 , 318 , 320) indexing a region (202 , 204 , 206 1 +206 2 +206 3 , 206 4 1 , 206 4 2 +206 4 3 +206 4 4 , 208 1 , 208 2 +208 3 +208 4) , a feature vector space , containing a query vector ;
determining a match criterion value (output data set) on the basis of the contents of said region (202 , 204 , 206 1 +206 2 +206 3 , 206 4 1 , 206 4 2 +206 4 3 +206 4 4 , 208 1 , 208 2 +208 3 +208 4) ;
for a non-terminal node (302 , 306 , 308 , 310) , determining whether the indexed region (202 , 204 , 206 , 206 4 , 208) meets the match criterion and if not , excluding child nodes (304 , 306 , 308 , 310 , 312 , 314 , 316 , 318 , 320) of said non-terminal node (302 , 306 , 308 , 310) from the search ;
repeating the preceding step (computer system) for different non-terminal nodes (302 , 306 , 308 , 310) until all non-terminal nodes (302 , 306 , 308 , 310) have been tested for said match criterion or excluded ;
and selecting data elements within the region or regions (202 , 204 , 206 1 +206 2 +206 3 , 206 4 1 , 206 4 2 +206 4 3 +206 4 4 , 208 1 , 208 2 +208 3 +208 4) , indexed by the terminal nodes (304 , 312 , 314 , 316 , 318 , 320) that are children of remaining non-terminal nodes (302 , 306 , 308 , 310) , which meet a further match criterion with respect to the query vector .

US8190610B2
CLAIM 42
. The computer system (preceding step) of claim 40 , wherein the reducing is accomplished by a plurality of reducers , the d the at least one processor and memory further operable for : partitioning the first or second intermediate data set into a plurality of partitions , at least some of the key-value pairs of the first or second intermediate data being provided to more than one of the partitions ;

and providing the partitioned intermediate data set of each partition to a separate one of the reducers .
EP1207464A2
CLAIM 6
A method of searching a database system according to claim 4 or 5 , the method comprising : identifying a terminal node (304 , 312 , 314 , 316 , 318 , 320) indexing a region (202 , 204 , 206 1 +206 2 +206 3 , 206 4 1 , 206 4 2 +206 4 3 +206 4 4 , 208 1 , 208 2 +208 3 +208 4) , a feature vector space , containing a query vector ;
determining a match criterion value on the basis of the contents of said region (202 , 204 , 206 1 +206 2 +206 3 , 206 4 1 , 206 4 2 +206 4 3 +206 4 4 , 208 1 , 208 2 +208 3 +208 4) ;
for a non-terminal node (302 , 306 , 308 , 310) , determining whether the indexed region (202 , 204 , 206 , 206 4 , 208) meets the match criterion and if not , excluding child nodes (304 , 306 , 308 , 310 , 312 , 314 , 316 , 318 , 320) of said non-terminal node (302 , 306 , 308 , 310) from the search ;
repeating the preceding step (computer system) for different non-terminal nodes (302 , 306 , 308 , 310) until all non-terminal nodes (302 , 306 , 308 , 310) have been tested for said match criterion or excluded ;
and selecting data elements within the region or regions (202 , 204 , 206 1 +206 2 +206 3 , 206 4 1 , 206 4 2 +206 4 3 +206 4 4 , 208 1 , 208 2 +208 3 +208 4) , indexed by the terminal nodes (304 , 312 , 314 , 316 , 318 , 320) that are children of remaining non-terminal nodes (302 , 306 , 308 , 310) , which meet a further match criterion with respect to the query vector .

US8190610B2
CLAIM 43
. The computer system (preceding step) of claim 42 , wherein all of the key-value pairs of the first and second intermediate data set are provided to all of the reducers .
EP1207464A2
CLAIM 6
A method of searching a database system according to claim 4 or 5 , the method comprising : identifying a terminal node (304 , 312 , 314 , 316 , 318 , 320) indexing a region (202 , 204 , 206 1 +206 2 +206 3 , 206 4 1 , 206 4 2 +206 4 3 +206 4 4 , 208 1 , 208 2 +208 3 +208 4) , a feature vector space , containing a query vector ;
determining a match criterion value on the basis of the contents of said region (202 , 204 , 206 1 +206 2 +206 3 , 206 4 1 , 206 4 2 +206 4 3 +206 4 4 , 208 1 , 208 2 +208 3 +208 4) ;
for a non-terminal node (302 , 306 , 308 , 310) , determining whether the indexed region (202 , 204 , 206 , 206 4 , 208) meets the match criterion and if not , excluding child nodes (304 , 306 , 308 , 310 , 312 , 314 , 316 , 318 , 320) of said non-terminal node (302 , 306 , 308 , 310) from the search ;
repeating the preceding step (computer system) for different non-terminal nodes (302 , 306 , 308 , 310) until all non-terminal nodes (302 , 306 , 308 , 310) have been tested for said match criterion or excluded ;
and selecting data elements within the region or regions (202 , 204 , 206 1 +206 2 +206 3 , 206 4 1 , 206 4 2 +206 4 3 +206 4 4 , 208 1 , 208 2 +208 3 +208 4) , indexed by the terminal nodes (304 , 312 , 314 , 316 , 318 , 320) that are children of remaining non-terminal nodes (302 , 306 , 308 , 310) , which meet a further match criterion with respect to the query vector .

US8190610B2
CLAIM 44
. The computer system (preceding step) of claim 42 , wherein at least some of the reducers include a sort , group-by-key and combine task , the d the at least one processor and memory are further operable for : generating and providing metadata for at least some of the mapping , partitioning , combining , grouping and sorting .
EP1207464A2
CLAIM 6
A method of searching a database system according to claim 4 or 5 , the method comprising : identifying a terminal node (304 , 312 , 314 , 316 , 318 , 320) indexing a region (202 , 204 , 206 1 +206 2 +206 3 , 206 4 1 , 206 4 2 +206 4 3 +206 4 4 , 208 1 , 208 2 +208 3 +208 4) , a feature vector space , containing a query vector ;
determining a match criterion value on the basis of the contents of said region (202 , 204 , 206 1 +206 2 +206 3 , 206 4 1 , 206 4 2 +206 4 3 +206 4 4 , 208 1 , 208 2 +208 3 +208 4) ;
for a non-terminal node (302 , 306 , 308 , 310) , determining whether the indexed region (202 , 204 , 206 , 206 4 , 208) meets the match criterion and if not , excluding child nodes (304 , 306 , 308 , 310 , 312 , 314 , 316 , 318 , 320) of said non-terminal node (302 , 306 , 308 , 310) from the search ;
repeating the preceding step (computer system) for different non-terminal nodes (302 , 306 , 308 , 310) until all non-terminal nodes (302 , 306 , 308 , 310) have been tested for said match criterion or excluded ;
and selecting data elements within the region or regions (202 , 204 , 206 1 +206 2 +206 3 , 206 4 1 , 206 4 2 +206 4 3 +206 4 4 , 208 1 , 208 2 +208 3 +208 4) , indexed by the terminal nodes (304 , 312 , 314 , 316 , 318 , 320) that are children of remaining non-terminal nodes (302 , 306 , 308 , 310) , which meet a further match criterion with respect to the query vector .

US8190610B2
CLAIM 45
. The computer system (preceding step) of claim 44 , wherein the reducing includes processing the metadata .
EP1207464A2
CLAIM 6
A method of searching a database system according to claim 4 or 5 , the method comprising : identifying a terminal node (304 , 312 , 314 , 316 , 318 , 320) indexing a region (202 , 204 , 206 1 +206 2 +206 3 , 206 4 1 , 206 4 2 +206 4 3 +206 4 4 , 208 1 , 208 2 +208 3 +208 4) , a feature vector space , containing a query vector ;
determining a match criterion value on the basis of the contents of said region (202 , 204 , 206 1 +206 2 +206 3 , 206 4 1 , 206 4 2 +206 4 3 +206 4 4 , 208 1 , 208 2 +208 3 +208 4) ;
for a non-terminal node (302 , 306 , 308 , 310) , determining whether the indexed region (202 , 204 , 206 , 206 4 , 208) meets the match criterion and if not , excluding child nodes (304 , 306 , 308 , 310 , 312 , 314 , 316 , 318 , 320) of said non-terminal node (302 , 306 , 308 , 310) from the search ;
repeating the preceding step (computer system) for different non-terminal nodes (302 , 306 , 308 , 310) until all non-terminal nodes (302 , 306 , 308 , 310) have been tested for said match criterion or excluded ;
and selecting data elements within the region or regions (202 , 204 , 206 1 +206 2 +206 3 , 206 4 1 , 206 4 2 +206 4 3 +206 4 4 , 208 1 , 208 2 +208 3 +208 4) , indexed by the terminal nodes (304 , 312 , 314 , 316 , 318 , 320) that are children of remaining non-terminal nodes (302 , 306 , 308 , 310) , which meet a further match criterion with respect to the query vector .

US8190610B2
CLAIM 46
. The computer system (preceding step) of claim 45 , wherein iterating includes providing the associated metadata to the processing of the reducing step .
EP1207464A2
CLAIM 6
A method of searching a database system according to claim 4 or 5 , the method comprising : identifying a terminal node (304 , 312 , 314 , 316 , 318 , 320) indexing a region (202 , 204 , 206 1 +206 2 +206 3 , 206 4 1 , 206 4 2 +206 4 3 +206 4 4 , 208 1 , 208 2 +208 3 +208 4) , a feature vector space , containing a query vector ;
determining a match criterion value on the basis of the contents of said region (202 , 204 , 206 1 +206 2 +206 3 , 206 4 1 , 206 4 2 +206 4 3 +206 4 4 , 208 1 , 208 2 +208 3 +208 4) ;
for a non-terminal node (302 , 306 , 308 , 310) , determining whether the indexed region (202 , 204 , 206 , 206 4 , 208) meets the match criterion and if not , excluding child nodes (304 , 306 , 308 , 310 , 312 , 314 , 316 , 318 , 320) of said non-terminal node (302 , 306 , 308 , 310) from the search ;
repeating the preceding step (computer system) for different non-terminal nodes (302 , 306 , 308 , 310) until all non-terminal nodes (302 , 306 , 308 , 310) have been tested for said match criterion or excluded ;
and selecting data elements within the region or regions (202 , 204 , 206 1 +206 2 +206 3 , 206 4 1 , 206 4 2 +206 4 3 +206 4 4 , 208 1 , 208 2 +208 3 +208 4) , indexed by the terminal nodes (304 , 312 , 314 , 316 , 318 , 320) that are children of remaining non-terminal nodes (302 , 306 , 308 , 310) , which meet a further match criterion with respect to the query vector .




US8190610B2

Filed: 2006-10-05     Issued: 2012-05-29

MapReduce for distributed database processing

(Original Assignee) Yahoo Inc     (Current Assignee) R2 Solutions LLC ; Altaba Inc

Ali Dasdan, Hung-Chih Yang, Ruey-Lung Hsiao
US20020091677A1

Filed: 2001-01-16     Issued: 2002-07-11

Content dereferencing in website development

(Original Assignee) AMPERSAND Corp     (Current Assignee) AMPERSAND Corp

Mandayam Sridhar
US8190610B2
CLAIM 1
. A method of processing data of a data set over a distributed system , wherein the data set comprises a plurality of data groups , the method comprising : partitioning the data of each one of the data groups into a plurality of data partitions that each have a plurality of key-value pairs and providing each data partition to a selected one of a plurality of mapping functions that are each user-configurable to independently output a plurality of lists of values for each of a set of keys found in such map function' ;

s corresponding data partition to form corresponding intermediate data for that data group and identifiable to that data group , wherein the data of a first data group has a different schema (data model) than the data of a second data group and the data of the first data group is mapped differently than the data of the second data group so that different lists of values are output for the corresponding different intermediate data , wherein the different schema and corresponding different intermediate data have a key in common ;

and reducing the intermediate data for the data groups to at least one output data group , including processing the intermediate data for each data group in a manner that is defined to correspond to that data group , so as to result in a merging of the corresponding different intermediate data based on the key in common , wherein the mapping and reducing operations are performed by a distributed system .
US20020091677A1
CLAIM 1
. A computer-implemented method for dereferencing content of a link table in a database , said link table specifying relationships between a plurality of other tables of said database , said link table comprising a plurality of link records , said link tables having a link table record ID attribute and a foreign key attribute associated with a specific attribute of one of said plurality of other tables , comprising : creating a first user data model (different schema) for said link table , said first user data model representing said link table as a child vector node and said foreign key attribute as an attribute of said child vector node ;
substituting said foreign key attribute in said first user data model with a given attribute associated with said one of said plurality of said other tables , thereby forming a second user data model ;
creating a dereferenced table from said link table using said second user data model , said dereferenced table providing , for each of said plurality of link records , content associated with said given attribute in a given record of said one of said other tables for a value associated with said foreign key attribute in said link table , said value associated with said foreign key attribute in said link table identifying said given record of said one of said other tables .

US8190610B2
CLAIM 12
. The method of claim 5 , wherein : the intermediate data includes a plurality of grouped sets of key/value pairs ;

the reducing step is carried out by a plurality of reducers ;

for at least one of the reducers , the iterator corresponding to a particular data group , for that reducer , operates according to a different key of a different schema (data model) than the iterator corresponding to another particular data group , for that reducer .
US20020091677A1
CLAIM 1
. A computer-implemented method for dereferencing content of a link table in a database , said link table specifying relationships between a plurality of other tables of said database , said link table comprising a plurality of link records , said link tables having a link table record ID attribute and a foreign key attribute associated with a specific attribute of one of said plurality of other tables , comprising : creating a first user data model (different schema) for said link table , said first user data model representing said link table as a child vector node and said foreign key attribute as an attribute of said child vector node ;
substituting said foreign key attribute in said first user data model with a given attribute associated with said one of said plurality of said other tables , thereby forming a second user data model ;
creating a dereferenced table from said link table using said second user data model , said dereferenced table providing , for each of said plurality of link records , content associated with said given attribute in a given record of said one of said other tables for a value associated with said foreign key attribute in said link table , said value associated with said foreign key attribute in said link table identifying said given record of said one of said other tables .

US8190610B2
CLAIM 17
. A computer system including a plurality of computing devices , the computer system configured to process data of a data set , wherein the data set comprises a plurality of data groups , the computer system comprises at least one processor and memory that are operable to perform the following operations : partitioning the data of each one of the data groups into a plurality of data partitions that each have a plurality of key-value pairs and providing each data partition to a selected one of a plurality of mapping functions that are each user-configurable to independently output a plurality of lists of values for each of a set of keys found in such map function' ;

s corresponding data partition to form corresponding intermediate data for that data group and identifiable to that data group , wherein the data of a first data group has a different schema (data model) than the data of a second data group and the data of the first data group is mapped differently than the data of the second data group so that different lists of values are output for the corresponding different intermediate data , wherein the different schema and corresponding different intermediate data have a key in common ;

and reduce the intermediate data for the data groups to at least one output data group , including processing the intermediate data for each data group in a manner that is defined to correspond to that data group so as to result in a merging of the corresponding different intermediate data based on the key in common .
US20020091677A1
CLAIM 1
. A computer-implemented method for dereferencing content of a link table in a database , said link table specifying relationships between a plurality of other tables of said database , said link table comprising a plurality of link records , said link tables having a link table record ID attribute and a foreign key attribute associated with a specific attribute of one of said plurality of other tables , comprising : creating a first user data model (different schema) for said link table , said first user data model representing said link table as a child vector node and said foreign key attribute as an attribute of said child vector node ;
substituting said foreign key attribute in said first user data model with a given attribute associated with said one of said plurality of said other tables , thereby forming a second user data model ;
creating a dereferenced table from said link table using said second user data model , said dereferenced table providing , for each of said plurality of link records , content associated with said given attribute in a given record of said one of said other tables for a value associated with said foreign key attribute in said link table , said value associated with said foreign key attribute in said link table identifying said given record of said one of said other tables .

US8190610B2
CLAIM 22
. The computer system of claim 21 , wherein : the intermediate data includes a plurality of grouped sets of key/value pairs ;

the at least one processor and memory are further operable to reduce the intermediate data via a plurality of reducers ;

for at least one of the reducers , the iterator corresponding to a particular data group , for that reducer , is configured to operate according to a different key of a different schema (data model) than the iterator corresponding to another particular data group , for that reducer .
US20020091677A1
CLAIM 1
. A computer-implemented method for dereferencing content of a link table in a database , said link table specifying relationships between a plurality of other tables of said database , said link table comprising a plurality of link records , said link tables having a link table record ID attribute and a foreign key attribute associated with a specific attribute of one of said plurality of other tables , comprising : creating a first user data model (different schema) for said link table , said first user data model representing said link table as a child vector node and said foreign key attribute as an attribute of said child vector node ;
substituting said foreign key attribute in said first user data model with a given attribute associated with said one of said plurality of said other tables , thereby forming a second user data model ;
creating a dereferenced table from said link table using said second user data model , said dereferenced table providing , for each of said plurality of link records , content associated with said given attribute in a given record of said one of said other tables for a value associated with said foreign key attribute in said link table , said value associated with said foreign key attribute in said link table identifying said given record of said one of said other tables .

US8190610B2
CLAIM 33
. A map-reduce method of processing data from a plurality of groups having different schema (data model) over a computer system , the method comprising : for a first data set having a plurality of first key-value pairs , wherein such first data set belongs to a first data group and the first key-value pairs have a first schema , partitioning the first data set into a plurality of data partitions and providing each data partition to a selected one of a plurality of mapping functions that are each user-configurable to independently output a plurality of lists of values for each of a set of keys found in such map function' ;

s corresponding data partition to form a first intermediate data set having a first set of resulting key-value pairs ;

for a second data set having a plurality of second key-value pairs , wherein such second data set belongs to a second data group and the second key-value pairs have a second schema , partitioning the first data set into a plurality of data partitions and providing each data partition to a selected one of a plurality of mapping functions to form a second intermediate data set having a second set (specific attribute) of resulting key-value pairs differing from the first set of resulting key-value pairs , wherein the first schema differs from the second schema , wherein the first and second schema and the first and second set of resulting key-value pairs have a key in common ;

and reducing the first and the second intermediate data set together so as to form an output data set , wherein the reducing is accomplished by iterating on at least one key from the first and second intermediate data set , and the output data set has a different schema than the first and second schema , wherein the mapping and reducing operations are performed by a distributed system .
US20020091677A1
CLAIM 1
. A computer-implemented method for dereferencing content of a link table in a database , said link table specifying relationships between a plurality of other tables of said database , said link table comprising a plurality of link records , said link tables having a link table record ID attribute and a foreign key attribute associated with a specific attribute (second set) of one of said plurality of other tables , comprising : creating a first user data model (different schema) for said link table , said first user data model representing said link table as a child vector node and said foreign key attribute as an attribute of said child vector node ;
substituting said foreign key attribute in said first user data model with a given attribute associated with said one of said plurality of said other tables , thereby forming a second user data model ;
creating a dereferenced table from said link table using said second user data model , said dereferenced table providing , for each of said plurality of link records , content associated with said given attribute in a given record of said one of said other tables for a value associated with said foreign key attribute in said link table , said value associated with said foreign key attribute in said link table identifying said given record of said one of said other tables .

US8190610B2
CLAIM 40
. A computer system including a plurality of computing devices , the computer system configured to process data of a data set , wherein the data set comprises a plurality of data groups , the computer system comprises at least one processor and memory that are operable to perform the following operations : for a first data set having a plurality of first key-value pairs , wherein such first data set belongs to a first data group and the first key-value pairs have a first schema , partitioning the first data set into a plurality of data partitions and providing each data partition to a selected one of a plurality of mapping functions that are each user-configurable to independently output a plurality of lists of values for each of a set of keys found in such map function' ;

s corresponding data partition to form a first intermediate data set having a first set of resulting key-value pairs ;

for a second data set having a plurality of second key-value pairs , wherein such second data set belongs to a second data group and the second key-value pairs have a second schema , partitioning the first data set into a plurality of data partitions and providing each data partition to a selected one of a plurality of mapping functions to form a second intermediate data set having a second set (specific attribute) of resulting key-value pairs differing from the first set of resulting key-value pairs , wherein the first schema differs from the second schema ;

and reducing the first and the second intermediate data set together so as to form an output data set , wherein the reducing is accomplished by iterating on at least one key from the first and second intermediate data set , and the output data set has a different schema (data model) than the first and second schema , wherein the first and second schema and the first and second set of resulting key-value pairs have a key in common .
US20020091677A1
CLAIM 1
. A computer-implemented method for dereferencing content of a link table in a database , said link table specifying relationships between a plurality of other tables of said database , said link table comprising a plurality of link records , said link tables having a link table record ID attribute and a foreign key attribute associated with a specific attribute (second set) of one of said plurality of other tables , comprising : creating a first user data model (different schema) for said link table , said first user data model representing said link table as a child vector node and said foreign key attribute as an attribute of said child vector node ;
substituting said foreign key attribute in said first user data model with a given attribute associated with said one of said plurality of said other tables , thereby forming a second user data model ;
creating a dereferenced table from said link table using said second user data model , said dereferenced table providing , for each of said plurality of link records , content associated with said given attribute in a given record of said one of said other tables for a value associated with said foreign key attribute in said link table , said value associated with said foreign key attribute in said link table identifying said given record of said one of said other tables .




US8190610B2

Filed: 2006-10-05     Issued: 2012-05-29

MapReduce for distributed database processing

(Original Assignee) Yahoo Inc     (Current Assignee) R2 Solutions LLC ; Altaba Inc

Ali Dasdan, Hung-Chih Yang, Ruey-Lung Hsiao
JP2002197099A

Filed: 2000-12-26     Issued: 2002-07-12

データベースの処理方法

(Original Assignee) Degital Works Kk; ディジタル・ワークス株式会社     

Koji Ito, 宏二 伊藤
US8190610B2
CLAIM 1
. A method of processing data of a data set over a distributed system , wherein the data set comprises a plurality of data groups , the method comprising : partitioning the data of each one of the data groups into a plurality of data partitions that each have a plurality of key-value pairs and providing each data partition to a selected one of a plurality of mapping functions that are each user-configurable to independently output a plurality of lists of values for each of a set of keys found in such map function' ;

s corresponding data partition to form corresponding intermediate data for that data group (えること) and identifiable to that data group , wherein the data of a first data group has a different schema than the data of a second data group and the data of the first data group is mapped differently than the data of the second data group so that different lists of values are output for the corresponding different intermediate data , wherein the different schema and corresponding different intermediate data have a key in common ;

and reducing the intermediate data for the data groups to at least one output data group , including processing the intermediate data for each data group in a manner that is defined to correspond to that data group , so as to result in a merging of the corresponding different intermediate data based on the key in common , wherein the mapping and reducing operations are performed by a distributed system .
JP2002197099A
CLAIM 1
【請求項1】オリジナルデータに基づき、項目毎にキー 値を圧縮して圧縮復元インデックスを作成し保存する段 階と、 圧縮復元インデックスに基づき、項目毎に圧縮キーを書 き込んだ圧縮結果セットキャッシュを作成し保存する段 階とを備えること (data group, first data group) を特徴とするデータベースの処理方 法。

US8190610B2
CLAIM 2
. The method of claim 1 , wherein : the at least one output data group (えること) is a plurality of output data groups .
JP2002197099A
CLAIM 1
【請求項1】オリジナルデータに基づき、項目毎にキー 値を圧縮して圧縮復元インデックスを作成し保存する段 階と、 圧縮復元インデックスに基づき、項目毎に圧縮キーを書 き込んだ圧縮結果セットキャッシュを作成し保存する段 階とを備えること (data group, first data group) を特徴とするデータベースの処理方 法。

US8190610B2
CLAIM 3
. The method of claim 1 , wherein : corresponding intermediate data for a data group (えること) being identifiable to the data group includes the corresponding intermediate data being stored such that metadata associated with a manner of storage is identifiable to the data group .
JP2002197099A
CLAIM 1
【請求項1】オリジナルデータに基づき、項目毎にキー 値を圧縮して圧縮復元インデックスを作成し保存する段 階と、 圧縮復元インデックスに基づき、項目毎に圧縮キーを書 き込んだ圧縮結果セットキャッシュを作成し保存する段 階とを備えること (data group, first data group) を特徴とするデータベースの処理方 法。

US8190610B2
CLAIM 4
. The method of claim 1 , wherein : corresponding intermediate data for a data group (えること) being identifiable to the data group includes the corresponding intermediate data being stored in association with data that is identifiable to the data group .
JP2002197099A
CLAIM 1
【請求項1】オリジナルデータに基づき、項目毎にキー 値を圧縮して圧縮復元インデックスを作成し保存する段 階と、 圧縮復元インデックスに基づき、項目毎に圧縮キーを書 き込んだ圧縮結果セットキャッシュを作成し保存する段 階とを備えること (data group, first data group) を特徴とするデータベースの処理方 法。

US8190610B2
CLAIM 5
. The method of claim 1 , wherein : processing the intermediate data for each data group (えること) in a manner that is defined to correspond to that data group includes , for each data group , employing an iterator that corresponds to that data group .
JP2002197099A
CLAIM 1
【請求項1】オリジナルデータに基づき、項目毎にキー 値を圧縮して圧縮復元インデックスを作成し保存する段 階と、 圧縮復元インデックスに基づき、項目毎に圧縮キーを書 き込んだ圧縮結果セットキャッシュを作成し保存する段 階とを備えること (data group, first data group) を特徴とするデータベースの処理方 法。

US8190610B2
CLAIM 11
. The method of claim 10 , wherein : processing the intermediate data for each data group (えること) in a manner that is defined to correspond to that data group includes , for each data group , employing an iterator that corresponds to that data group , wherein the iterator includes providing the associated metadata to the processing of the reducing step .
JP2002197099A
CLAIM 1
【請求項1】オリジナルデータに基づき、項目毎にキー 値を圧縮して圧縮復元インデックスを作成し保存する段 階と、 圧縮復元インデックスに基づき、項目毎に圧縮キーを書 き込んだ圧縮結果セットキャッシュを作成し保存する段 階とを備えること (data group, first data group) を特徴とするデータベースの処理方 法。

US8190610B2
CLAIM 12
. The method of claim 5 , wherein : the intermediate data includes a plurality of grouped sets of key/value pairs ;

the reducing step is carried out by a plurality of reducers ;

for at least one of the reducers , the iterator corresponding to a particular data (読み出す段階) group , for that reducer , operates according to a different key (のキー) of a different schema than the iterator corresponding to another particular data group , for that reducer .
JP2002197099A
CLAIM 1
【請求項1】オリジナルデータに基づき、項目毎にキー 値を圧縮して圧縮復元インデックスを作成し保存する段 階と、 圧縮復元インデックスに基づき、項目毎に圧縮キーを書 き込んだ圧縮結果セットキャッシュを作成し保存する段 階とを備えること (data group, first data group) を特徴とするデータベースの処理方 法。

JP2002197099A
CLAIM 2
【請求項2】オリジナルデータに基づき、項目毎にキー 値を圧縮して圧縮復元インデックスを作成し保存する段 階と、 圧縮復元インデックスに基づき、項目毎に圧縮キーを書 き込んだ圧縮結果セットキャッシュを作成し保存する段 階と、 検索条件が与えられたときに、前記圧縮結果セットキャ ッシュに基づいて圧縮キーを検索する段階と、 検索した圧縮キーに基づいて、前記圧縮復元インデック スからオリジナルのキー (different key) 値を読み出す段階 (particular data) とを備えるこ とを特徴とする請求項1記載のデータベースの処理方 法。

US8190610B2
CLAIM 17
. A computer system including a plurality of computing devices , the computer system configured to process data of a data set , wherein the data set comprises a plurality of data groups , the computer system comprises at least one processor and memory that are operable to perform the following operations : partitioning the data of each one of the data groups into a plurality of data partitions that each have a plurality of key-value pairs and providing each data partition to a selected one of a plurality of mapping functions that are each user-configurable to independently output a plurality of lists of values for each of a set of keys found in such map function' ;

s corresponding data partition to form corresponding intermediate data for that data group (えること) and identifiable to that data group , wherein the data of a first data group has a different schema than the data of a second data group and the data of the first data group is mapped differently than the data of the second data group so that different lists of values are output for the corresponding different intermediate data , wherein the different schema and corresponding different intermediate data have a key in common ;

and reduce the intermediate data for the data groups to at least one output data group , including processing the intermediate data for each data group in a manner that is defined to correspond to that data group so as to result in a merging of the corresponding different intermediate data based on the key in common .
JP2002197099A
CLAIM 1
【請求項1】オリジナルデータに基づき、項目毎にキー 値を圧縮して圧縮復元インデックスを作成し保存する段 階と、 圧縮復元インデックスに基づき、項目毎に圧縮キーを書 き込んだ圧縮結果セットキャッシュを作成し保存する段 階とを備えること (data group, first data group) を特徴とするデータベースの処理方 法。

US8190610B2
CLAIM 18
. The computer system of claim 17 , wherein : the at least one output data group (えること) is a plurality of output data groups .
JP2002197099A
CLAIM 1
【請求項1】オリジナルデータに基づき、項目毎にキー 値を圧縮して圧縮復元インデックスを作成し保存する段 階と、 圧縮復元インデックスに基づき、項目毎に圧縮キーを書 き込んだ圧縮結果セットキャッシュを作成し保存する段 階とを備えること (data group, first data group) を特徴とするデータベースの処理方 法。

US8190610B2
CLAIM 19
. The computer system of claim 17 , wherein : corresponding intermediate data for a data group (えること) being identifiable to the data group includes the corresponding intermediate data being stored such that metadata associated with the manner of storage is identifiable to the data group .
JP2002197099A
CLAIM 1
【請求項1】オリジナルデータに基づき、項目毎にキー 値を圧縮して圧縮復元インデックスを作成し保存する段 階と、 圧縮復元インデックスに基づき、項目毎に圧縮キーを書 き込んだ圧縮結果セットキャッシュを作成し保存する段 階とを備えること (data group, first data group) を特徴とするデータベースの処理方 法。

US8190610B2
CLAIM 20
. The computer system of claim 19 , wherein : corresponding intermediate data for a data group (えること) being identifiable to the data group includes the computer system being configured to store the corresponding intermediate data in association with data that is identifiable to the data group .
JP2002197099A
CLAIM 1
【請求項1】オリジナルデータに基づき、項目毎にキー 値を圧縮して圧縮復元インデックスを作成し保存する段 階と、 圧縮復元インデックスに基づき、項目毎に圧縮キーを書 き込んだ圧縮結果セットキャッシュを作成し保存する段 階とを備えること (data group, first data group) を特徴とするデータベースの処理方 法。

US8190610B2
CLAIM 21
. The computer system of claim 17 , wherein : the at least one processor and memory being configured to process the intermediate data for each data group (えること) in a manner that is defined to correspond to that data group includes , for each data group , the at least one processor and memory being configured to employ an iterator that corresponds to that data group .
JP2002197099A
CLAIM 1
【請求項1】オリジナルデータに基づき、項目毎にキー 値を圧縮して圧縮復元インデックスを作成し保存する段 階と、 圧縮復元インデックスに基づき、項目毎に圧縮キーを書 き込んだ圧縮結果セットキャッシュを作成し保存する段 階とを備えること (data group, first data group) を特徴とするデータベースの処理方 法。

US8190610B2
CLAIM 22
. The computer system of claim 21 , wherein : the intermediate data includes a plurality of grouped sets of key/value pairs ;

the at least one processor and memory are further operable to reduce the intermediate data via a plurality of reducers ;

for at least one of the reducers , the iterator corresponding to a particular data (読み出す段階) group , for that reducer , is configured to operate according to a different key (のキー) of a different schema than the iterator corresponding to another particular data group , for that reducer .
JP2002197099A
CLAIM 1
【請求項1】オリジナルデータに基づき、項目毎にキー 値を圧縮して圧縮復元インデックスを作成し保存する段 階と、 圧縮復元インデックスに基づき、項目毎に圧縮キーを書 き込んだ圧縮結果セットキャッシュを作成し保存する段 階とを備えること (data group, first data group) を特徴とするデータベースの処理方 法。

JP2002197099A
CLAIM 2
【請求項2】オリジナルデータに基づき、項目毎にキー 値を圧縮して圧縮復元インデックスを作成し保存する段 階と、 圧縮復元インデックスに基づき、項目毎に圧縮キーを書 き込んだ圧縮結果セットキャッシュを作成し保存する段 階と、 検索条件が与えられたときに、前記圧縮結果セットキャ ッシュに基づいて圧縮キーを検索する段階と、 検索した圧縮キーに基づいて、前記圧縮復元インデック スからオリジナルのキー (different key) 値を読み出す段階 (particular data) とを備えるこ とを特徴とする請求項1記載のデータベースの処理方 法。

US8190610B2
CLAIM 28
. The computer system of claim 27 , wherein : the at least one processor and memory being further operable to process the intermediate data for each data group (えること) in a manner that is defined to correspond to that data group includes , for each data group , the at least one processor and memory being further operable to employ an iterator that corresponds to that data group , wherein the iterator is configured to provide the associated metadata to the reducing .
JP2002197099A
CLAIM 1
【請求項1】オリジナルデータに基づき、項目毎にキー 値を圧縮して圧縮復元インデックスを作成し保存する段 階と、 圧縮復元インデックスに基づき、項目毎に圧縮キーを書 き込んだ圧縮結果セットキャッシュを作成し保存する段 階とを備えること (data group, first data group) を特徴とするデータベースの処理方 法。

US8190610B2
CLAIM 33
. A map-reduce method of processing data from a plurality of groups having different schema over a computer system , the method comprising : for a first data set having a plurality of first key-value pairs , wherein such first data set belongs to a first data group (えること) and the first key-value pairs have a first schema , partitioning the first data set into a plurality of data partitions and providing each data partition to a selected one of a plurality of mapping functions that are each user-configurable to independently output a plurality of lists of values for each of a set of keys found in such map function' ;

s corresponding data partition to form a first intermediate data set having a first set of resulting key-value pairs ;

for a second data set having a plurality of second key-value pairs , wherein such second data set belongs to a second data group and the second key-value pairs have a second schema , partitioning the first data set into a plurality of data partitions and providing each data partition to a selected one of a plurality of mapping functions to form a second intermediate data set having a second set of resulting key-value pairs differing from the first set of resulting key-value pairs , wherein the first schema differs from the second schema , wherein the first and second schema and the first and second set of resulting key-value pairs have a key in common ;

and reducing the first and the second intermediate data set together so as to form an output data set , wherein the reducing is accomplished by iterating on at least one key from the first and second intermediate data set , and the output data set has a different schema than the first and second schema , wherein the mapping and reducing operations are performed by a distributed system .
JP2002197099A
CLAIM 1
【請求項1】オリジナルデータに基づき、項目毎にキー 値を圧縮して圧縮復元インデックスを作成し保存する段 階と、 圧縮復元インデックスに基づき、項目毎に圧縮キーを書 き込んだ圧縮結果セットキャッシュを作成し保存する段 階とを備えること (data group, first data group) を特徴とするデータベースの処理方 法。

US8190610B2
CLAIM 40
. A computer system including a plurality of computing devices , the computer system configured to process data of a data set , wherein the data set comprises a plurality of data groups , the computer system comprises at least one processor and memory that are operable to perform the following operations : for a first data set having a plurality of first key-value pairs , wherein such first data set belongs to a first data group (えること) and the first key-value pairs have a first schema , partitioning the first data set into a plurality of data partitions and providing each data partition to a selected one of a plurality of mapping functions that are each user-configurable to independently output a plurality of lists of values for each of a set of keys found in such map function' ;

s corresponding data partition to form a first intermediate data set having a first set of resulting key-value pairs ;

for a second data set having a plurality of second key-value pairs , wherein such second data set belongs to a second data group and the second key-value pairs have a second schema , partitioning the first data set into a plurality of data partitions and providing each data partition to a selected one of a plurality of mapping functions to form a second intermediate data set having a second set of resulting key-value pairs differing from the first set of resulting key-value pairs , wherein the first schema differs from the second schema ;

and reducing the first and the second intermediate data set together so as to form an output data set , wherein the reducing is accomplished by iterating on at least one key from the first and second intermediate data set , and the output data set has a different schema than the first and second schema , wherein the first and second schema and the first and second set of resulting key-value pairs have a key in common .
JP2002197099A
CLAIM 1
【請求項1】オリジナルデータに基づき、項目毎にキー 値を圧縮して圧縮復元インデックスを作成し保存する段 階と、 圧縮復元インデックスに基づき、項目毎に圧縮キーを書 き込んだ圧縮結果セットキャッシュを作成し保存する段 階とを備えること (data group, first data group) を特徴とするデータベースの処理方 法。




US8190610B2

Filed: 2006-10-05     Issued: 2012-05-29

MapReduce for distributed database processing

(Original Assignee) Yahoo Inc     (Current Assignee) R2 Solutions LLC ; Altaba Inc

Ali Dasdan, Hung-Chih Yang, Ruey-Lung Hsiao
US20010051881A1

Filed: 2000-12-22     Issued: 2001-12-13

System, method and article of manufacture for managing a medical services network

(Original Assignee) NEUROGRAFIX     (Current Assignee) NEUROGRAFIX

Aaron Filler
US8190610B2
CLAIM 17
. A computer system (data capture) including a plurality of computing devices , the computer system configured to process data of a data set , wherein the data set comprises a plurality of data groups , the computer system comprises at least one processor and memory that are operable to perform the following operations : partitioning the data of each one of the data groups into a plurality of data partitions that each have a plurality of key-value pairs and providing each data partition to a selected one of a plurality of mapping functions that are each user-configurable to independently output a plurality of lists of values for each of a set of keys found in such map function' ;

s corresponding data partition to form corresponding intermediate data for that data group and identifiable to that data group , wherein the data of a first data group has a different schema than the data of a second data group and the data of the first data group is mapped differently than the data of the second data group so that different lists of values are output for the corresponding different intermediate data , wherein the different schema and corresponding different intermediate data have a key in common ;

and reduce the intermediate data for the data groups to at least one output data group , including processing the intermediate data for each data group in a manner that is defined to correspond to that data group so as to result in a merging of the corresponding different intermediate data based on the key in common .
US20010051881A1
CLAIM 7
. The method of claim 1 , wherein the diagnostic data includes video data capture (computer system) d by a video camera during performance of the diagnostic service .

US8190610B2
CLAIM 18
. The computer system (data capture) of claim 17 , wherein : the at least one output data group is a plurality of output data groups .
US20010051881A1
CLAIM 7
. The method of claim 1 , wherein the diagnostic data includes video data capture (computer system) d by a video camera during performance of the diagnostic service .

US8190610B2
CLAIM 19
. The computer system (data capture) of claim 17 , wherein : corresponding intermediate data for a data group being identifiable to the data group includes the corresponding intermediate data being stored such that metadata associated with the manner of storage is identifiable to the data group .
US20010051881A1
CLAIM 7
. The method of claim 1 , wherein the diagnostic data includes video data capture (computer system) d by a video camera during performance of the diagnostic service .

US8190610B2
CLAIM 20
. The computer system (data capture) of claim 19 , wherein : corresponding intermediate data for a data group being identifiable to the data group includes the computer system being configured to store the corresponding intermediate data in association with data that is identifiable to the data group .
US20010051881A1
CLAIM 7
. The method of claim 1 , wherein the diagnostic data includes video data capture (computer system) d by a video camera during performance of the diagnostic service .

US8190610B2
CLAIM 21
. The computer system (data capture) of claim 17 , wherein : the at least one processor and memory being configured to process the intermediate data for each data group in a manner that is defined to correspond to that data group includes , for each data group , the at least one processor and memory being configured to employ an iterator that corresponds to that data group .
US20010051881A1
CLAIM 7
. The method of claim 1 , wherein the diagnostic data includes video data capture (computer system) d by a video camera during performance of the diagnostic service .

US8190610B2
CLAIM 22
. The computer system (data capture) of claim 21 , wherein : the intermediate data includes a plurality of grouped sets of key/value pairs ;

the at least one processor and memory are further operable to reduce the intermediate data via a plurality of reducers ;

for at least one of the reducers , the iterator corresponding to a particular data group , for that reducer , is configured to operate according to a different key of a different schema than the iterator corresponding to another particular data group , for that reducer .
US20010051881A1
CLAIM 7
. The method of claim 1 , wherein the diagnostic data includes video data capture (computer system) d by a video camera during performance of the diagnostic service .

US8190610B2
CLAIM 23
. The computer system (data capture) of claim 17 , wherein : the intermediate data includes a plurality of grouped sets of key/value pairs ;

the at least one processor and memory are further operable to reduce the intermediate data via a plurality of reducers ;

and the at least one processor and memory are further operable to : partition the intermediate data into a plurality of partitions , to provide each key/value pair of the intermediate data to a separate one of the partitions ;

and provide the intermediate data of each partition to a separate one of the reducers .
US20010051881A1
CLAIM 7
. The method of claim 1 , wherein the diagnostic data includes video data capture (computer system) d by a video camera during performance of the diagnostic service .

US8190610B2
CLAIM 24
. The computer system (data capture) of claim 17 , wherein : the intermediate data includes a plurality of grouped sets of key/value pairs ;

the at least one processor and memory are further operable to reduce the intermediate data via a plurality of reducers ;

and the at least one processor and memory are further operable to : partition the intermediate data into a plurality of partitions , to provide at least some of the key/value pairs of the intermediate data to more than one of the partitions ;

and provide the intermediate data of each partition to a separate one of the reducers .
US20010051881A1
CLAIM 7
. The method of claim 1 , wherein the diagnostic data includes video data capture (computer system) d by a video camera during performance of the diagnostic service .

US8190610B2
CLAIM 25
. The computer system (data capture) of claim 24 , wherein : the at least one processor and memory are further operable to , in partitioning , to provide all of the key/value pairs of the intermediate data to all of the partitions .
US20010051881A1
CLAIM 7
. The method of claim 1 , wherein the diagnostic data includes video data capture (computer system) d by a video camera during performance of the diagnostic service .

US8190610B2
CLAIM 26
. The computer system (data capture) of claim 24 , wherein : at least some of the reducers include a sort , group-by-key and combine task ;

the at least one processor and memory are further operable to generate and provide metadata for at least some of the mapping , partitioning , combining , grouping and sorting .
US20010051881A1
CLAIM 7
. The method of claim 1 , wherein the diagnostic data includes video data capture (computer system) d by a video camera during performance of the diagnostic service .

US8190610B2
CLAIM 27
. The computer system (data capture) of claim 26 , wherein : the reducing includes processing the metadata .
US20010051881A1
CLAIM 7
. The method of claim 1 , wherein the diagnostic data includes video data capture (computer system) d by a video camera during performance of the diagnostic service .

US8190610B2
CLAIM 28
. The computer system (data capture) of claim 27 , wherein : the at least one processor and memory being further operable to process the intermediate data for each data group in a manner that is defined to correspond to that data group includes , for each data group , the at least one processor and memory being further operable to employ an iterator that corresponds to that data group , wherein the iterator is configured to provide the associated metadata to the reducing .
US20010051881A1
CLAIM 7
. The method of claim 1 , wherein the diagnostic data includes video data capture (computer system) d by a video camera during performance of the diagnostic service .

US8190610B2
CLAIM 29
. The computer system (data capture) of claim 17 , wherein : the intermediate data processing of the reducing further comprises processing data that is not intermediate data .
US20010051881A1
CLAIM 7
. The method of claim 1 , wherein the diagnostic data includes video data capture (computer system) d by a video camera during performance of the diagnostic service .

US8190610B2
CLAIM 30
. The computer system (data capture) of claim 29 , wherein : the at least one processor and memory are further operable to reduce the intermediate data via a plurality of reducers ;

and the data that is not intermediate data , for a particular reducer , includes data that is associated with another reducer .
US20010051881A1
CLAIM 7
. The method of claim 1 , wherein the diagnostic data includes video data capture (computer system) d by a video camera during performance of the diagnostic service .

US8190610B2
CLAIM 31
. The computer system (data capture) of claim 29 , wherein : the at least one processor and memory are further operable to reduce the intermediate data via a plurality of reducers ;

and the data that is not intermediate data , for a particular reducer , includes data that is associated with that reducer .
US20010051881A1
CLAIM 7
. The method of claim 1 , wherein the diagnostic data includes video data capture (computer system) d by a video camera during performance of the diagnostic service .

US8190610B2
CLAIM 32
. The computer system (data capture) of claim 17 , wherein the at least one processor and memory are further operable to reduce by relating the data among the plurality of data groups .
US20010051881A1
CLAIM 7
. The method of claim 1 , wherein the diagnostic data includes video data capture (computer system) d by a video camera during performance of the diagnostic service .

US8190610B2
CLAIM 33
. A map-reduce method of processing data from a plurality of groups having different schema over a computer system (data capture) , the method comprising : for a first data set having a plurality of first key-value pairs , wherein such first data set belongs to a first data group and the first key-value pairs have a first schema , partitioning the first data set into a plurality of data partitions and providing each data partition to a selected one of a plurality of mapping functions that are each user-configurable to independently output a plurality of lists of values for each of a set of keys found in such map function' ;

s corresponding data partition to form a first intermediate data set having a first set of resulting key-value pairs ;

for a second data set having a plurality of second key-value pairs , wherein such second data set belongs to a second data group and the second key-value pairs have a second schema , partitioning the first data set into a plurality of data partitions and providing each data partition to a selected one of a plurality of mapping functions to form a second intermediate data set having a second set of resulting key-value pairs differing from the first set of resulting key-value pairs , wherein the first schema differs from the second schema , wherein the first and second schema and the first and second set of resulting key-value pairs have a key in common ;

and reducing the first and the second intermediate data set together so as to form an output data set , wherein the reducing is accomplished by iterating on at least one key from the first and second intermediate data set , and the output data set has a different schema than the first and second schema , wherein the mapping and reducing operations are performed by a distributed system .
US20010051881A1
CLAIM 7
. The method of claim 1 , wherein the diagnostic data includes video data capture (computer system) d by a video camera during performance of the diagnostic service .

US8190610B2
CLAIM 40
. A computer system (data capture) including a plurality of computing devices , the computer system configured to process data of a data set , wherein the data set comprises a plurality of data groups , the computer system comprises at least one processor and memory that are operable to perform the following operations : for a first data set having a plurality of first key-value pairs , wherein such first data set belongs to a first data group and the first key-value pairs have a first schema , partitioning the first data set into a plurality of data partitions and providing each data partition to a selected one of a plurality of mapping functions that are each user-configurable to independently output a plurality of lists of values for each of a set of keys found in such map function' ;

s corresponding data partition to form a first intermediate data set having a first set of resulting key-value pairs ;

for a second data set having a plurality of second key-value pairs , wherein such second data set belongs to a second data group and the second key-value pairs have a second schema , partitioning the first data set into a plurality of data partitions and providing each data partition to a selected one of a plurality of mapping functions to form a second intermediate data set having a second set of resulting key-value pairs differing from the first set of resulting key-value pairs , wherein the first schema differs from the second schema ;

and reducing the first and the second intermediate data set together so as to form an output data set , wherein the reducing is accomplished by iterating on at least one key from the first and second intermediate data set , and the output data set has a different schema than the first and second schema , wherein the first and second schema and the first and second set of resulting key-value pairs have a key in common .
US20010051881A1
CLAIM 7
. The method of claim 1 , wherein the diagnostic data includes video data capture (computer system) d by a video camera during performance of the diagnostic service .

US8190610B2
CLAIM 41
. The computer system (data capture) of claim 40 , wherein the reducing is accomplished by iterating on at least one key from each of the first and second intermediate data set so that the output data set is a merging of a portion of the first and second intermediate data set .
US20010051881A1
CLAIM 7
. The method of claim 1 , wherein the diagnostic data includes video data capture (computer system) d by a video camera during performance of the diagnostic service .

US8190610B2
CLAIM 42
. The computer system (data capture) of claim 40 , wherein the reducing is accomplished by a plurality of reducers , the d the at least one processor and memory further operable for : partitioning the first or second intermediate data set into a plurality of partitions , at least some of the key-value pairs of the first or second intermediate data being provided to more than one of the partitions ;

and providing the partitioned intermediate data set of each partition to a separate one of the reducers .
US20010051881A1
CLAIM 7
. The method of claim 1 , wherein the diagnostic data includes video data capture (computer system) d by a video camera during performance of the diagnostic service .

US8190610B2
CLAIM 43
. The computer system (data capture) of claim 42 , wherein all of the key-value pairs of the first and second intermediate data set are provided to all of the reducers .
US20010051881A1
CLAIM 7
. The method of claim 1 , wherein the diagnostic data includes video data capture (computer system) d by a video camera during performance of the diagnostic service .

US8190610B2
CLAIM 44
. The computer system (data capture) of claim 42 , wherein at least some of the reducers include a sort , group-by-key and combine task , the d the at least one processor and memory are further operable for : generating and providing metadata for at least some of the mapping , partitioning , combining , grouping and sorting .
US20010051881A1
CLAIM 7
. The method of claim 1 , wherein the diagnostic data includes video data capture (computer system) d by a video camera during performance of the diagnostic service .

US8190610B2
CLAIM 45
. The computer system (data capture) of claim 44 , wherein the reducing includes processing the metadata .
US20010051881A1
CLAIM 7
. The method of claim 1 , wherein the diagnostic data includes video data capture (computer system) d by a video camera during performance of the diagnostic service .

US8190610B2
CLAIM 46
. The computer system (data capture) of claim 45 , wherein iterating includes providing the associated metadata to the processing of the reducing step .
US20010051881A1
CLAIM 7
. The method of claim 1 , wherein the diagnostic data includes video data capture (computer system) d by a video camera during performance of the diagnostic service .




US8190610B2

Filed: 2006-10-05     Issued: 2012-05-29

MapReduce for distributed database processing

(Original Assignee) Yahoo Inc     (Current Assignee) R2 Solutions LLC ; Altaba Inc

Ali Dasdan, Hung-Chih Yang, Ruey-Lung Hsiao
EP1130872A1

Filed: 2000-09-18     Issued: 2001-09-05

Method of packet scheduling, with improved delay performance, for wireless networks

(Original Assignee) Nokia of America Corp     (Current Assignee) Nokia of America Corp

Aleksandr Stolyar, Rajiv Vijayakumar
US8190610B2
CLAIM 1
. A method of processing data of a data set over a distributed system , wherein the data set comprises a plurality of data groups , the method comprising : partitioning the data of each one of the data groups into a plurality of data partitions that each have a plurality of key-value pairs and providing each data partition to a selected one of a plurality of mapping functions that are each user-configurable to independently output a plurality of lists of values for each of a set of keys found in such map function' ;

s corresponding data partition to form corresponding intermediate data for that data group and identifiable to that data group , wherein the data of a first data (time t) group has a different schema than the data of a second data group and the data of the first data group is mapped differently than the data of the second data group so that different lists of values are output for the corresponding different intermediate data , wherein the different schema and corresponding different intermediate data have a key in common ;

and reducing the intermediate data for the data groups to at least one output data group , including processing the intermediate data for each data group in a manner that is defined to correspond to that data group , so as to result in a merging of the corresponding different intermediate data based on the key in common , wherein the mapping and reducing operations are performed by a distributed system .
EP1130872A1
CLAIM 1
A method for scheduling queued packets for service by a wireless base station , wherein there is a queue of packets destined for each of a plurality of users , the method comprising : periodically identifying a queue having a largest weighted delay ;
and scheduling the identified queue for service , during a scheduling interval , at the greatest transmission rate available for serving said queue ;
wherein : (a) each queue i has , at a given time t (first data, first data group) , a delay W i (t) determined by one of the following : the age of the oldest packet in said queue ;
a total amount of data in said queue ;
in a system in which service of said queue is regulated by a virtual queue that receives tokens at a constant rate , the age of the oldest token in the virtual queue ;
or the number of tokens in a corresponding virtual queue ;
and (b) at time t , the weighted delay of each queue i is expressed by γ i / c i (t) times W i (t) , or by γ i / c i (t) times an increasing function of W i (t) , wherein γ i is a constant , and c i (t) is a weight coefficient that represents the transmission power required per unit data rate to transmit data , at time t , to the user whose destined queue is queue i .

US8190610B2
CLAIM 17
. A computer system including a plurality of computing devices , the computer system configured to process data of a data set , wherein the data set comprises a plurality of data groups , the computer system comprises at least one processor and memory that are operable to perform the following operations : partitioning the data of each one of the data groups into a plurality of data partitions that each have a plurality of key-value pairs and providing each data partition to a selected one of a plurality of mapping functions that are each user-configurable to independently output a plurality of lists of values for each of a set of keys found in such map function' ;

s corresponding data partition to form corresponding intermediate data for that data group and identifiable to that data group , wherein the data of a first data (time t) group has a different schema than the data of a second data group and the data of the first data group is mapped differently than the data of the second data group so that different lists of values are output for the corresponding different intermediate data , wherein the different schema and corresponding different intermediate data have a key in common ;

and reduce the intermediate data for the data groups to at least one output data group , including processing the intermediate data for each data group in a manner that is defined to correspond to that data group so as to result in a merging of the corresponding different intermediate data based on the key in common .
EP1130872A1
CLAIM 1
A method for scheduling queued packets for service by a wireless base station , wherein there is a queue of packets destined for each of a plurality of users , the method comprising : periodically identifying a queue having a largest weighted delay ;
and scheduling the identified queue for service , during a scheduling interval , at the greatest transmission rate available for serving said queue ;
wherein : (a) each queue i has , at a given time t (first data, first data group) , a delay W i (t) determined by one of the following : the age of the oldest packet in said queue ;
a total amount of data in said queue ;
in a system in which service of said queue is regulated by a virtual queue that receives tokens at a constant rate , the age of the oldest token in the virtual queue ;
or the number of tokens in a corresponding virtual queue ;
and (b) at time t , the weighted delay of each queue i is expressed by γ i / c i (t) times W i (t) , or by γ i / c i (t) times an increasing function of W i (t) , wherein γ i is a constant , and c i (t) is a weight coefficient that represents the transmission power required per unit data rate to transmit data , at time t , to the user whose destined queue is queue i .

US8190610B2
CLAIM 33
. A map-reduce method of processing data from a plurality of groups having different schema over a computer system , the method comprising : for a first data (time t) set having a plurality of first key-value pairs , wherein such first data set belongs to a first data group and the first key-value pairs have a first schema , partitioning the first data set into a plurality of data partitions and providing each data partition to a selected one of a plurality of mapping functions that are each user-configurable to independently output a plurality of lists of values for each of a set of keys found in such map function' ;

s corresponding data partition to form a first intermediate data set having a first set of resulting key-value pairs ;

for a second data set having a plurality of second key-value pairs , wherein such second data set belongs to a second data group and the second key-value pairs have a second schema , partitioning the first data set into a plurality of data partitions and providing each data partition to a selected one of a plurality of mapping functions to form a second intermediate data set having a second set of resulting key-value pairs differing from the first set of resulting key-value pairs , wherein the first schema differs from the second schema , wherein the first and second schema and the first and second set of resulting key-value pairs have a key in common ;

and reducing the first and the second intermediate data set together so as to form an output data set (n value) , wherein the reducing is accomplished by iterating on at least one key from the first and second intermediate data set , and the output data set has a different schema than the first and second schema , wherein the mapping and reducing operations are performed by a distributed system .
EP1130872A1
CLAIM 1
A method for scheduling queued packets for service by a wireless base station , wherein there is a queue of packets destined for each of a plurality of users , the method comprising : periodically identifying a queue having a largest weighted delay ;
and scheduling the identified queue for service , during a scheduling interval , at the greatest transmission rate available for serving said queue ;
wherein : (a) each queue i has , at a given time t (first data, first data group) , a delay W i (t) determined by one of the following : the age of the oldest packet in said queue ;
a total amount of data in said queue ;
in a system in which service of said queue is regulated by a virtual queue that receives tokens at a constant rate , the age of the oldest token in the virtual queue ;
or the number of tokens in a corresponding virtual queue ;
and (b) at time t , the weighted delay of each queue i is expressed by γ i / c i (t) times W i (t) , or by γ i / c i (t) times an increasing function of W i (t) , wherein γ i is a constant , and c i (t) is a weight coefficient that represents the transmission power required per unit data rate to transmit data , at time t , to the user whose destined queue is queue i .

EP1130872A1
CLAIM 6
The method of claim 4 , wherein each constant γ i is proportional to - c i lnδ i / T i , wherein c i represents a measured short-term average or short-term median value (output data set) of the weight coefficient c i (t) .

US8190610B2
CLAIM 34
. The map-reduce method of claim 33 , wherein the reducing is accomplished by iterating on at least one key from each of the first and second intermediate data set so that the output data set (n value) is a merging of a portion of the first and second intermediate data set .
EP1130872A1
CLAIM 6
The method of claim 4 , wherein each constant γ i is proportional to - c i lnδ i / T i , wherein c i represents a measured short-term average or short-term median value (output data set) of the weight coefficient c i (t) .

US8190610B2
CLAIM 40
. A computer system including a plurality of computing devices , the computer system configured to process data of a data set , wherein the data set comprises a plurality of data groups , the computer system comprises at least one processor and memory that are operable to perform the following operations : for a first data (time t) set having a plurality of first key-value pairs , wherein such first data set belongs to a first data group and the first key-value pairs have a first schema , partitioning the first data set into a plurality of data partitions and providing each data partition to a selected one of a plurality of mapping functions that are each user-configurable to independently output a plurality of lists of values for each of a set of keys found in such map function' ;

s corresponding data partition to form a first intermediate data set having a first set of resulting key-value pairs ;

for a second data set having a plurality of second key-value pairs , wherein such second data set belongs to a second data group and the second key-value pairs have a second schema , partitioning the first data set into a plurality of data partitions and providing each data partition to a selected one of a plurality of mapping functions to form a second intermediate data set having a second set of resulting key-value pairs differing from the first set of resulting key-value pairs , wherein the first schema differs from the second schema ;

and reducing the first and the second intermediate data set together so as to form an output data set (n value) , wherein the reducing is accomplished by iterating on at least one key from the first and second intermediate data set , and the output data set has a different schema than the first and second schema , wherein the first and second schema and the first and second set of resulting key-value pairs have a key in common .
EP1130872A1
CLAIM 1
A method for scheduling queued packets for service by a wireless base station , wherein there is a queue of packets destined for each of a plurality of users , the method comprising : periodically identifying a queue having a largest weighted delay ;
and scheduling the identified queue for service , during a scheduling interval , at the greatest transmission rate available for serving said queue ;
wherein : (a) each queue i has , at a given time t (first data, first data group) , a delay W i (t) determined by one of the following : the age of the oldest packet in said queue ;
a total amount of data in said queue ;
in a system in which service of said queue is regulated by a virtual queue that receives tokens at a constant rate , the age of the oldest token in the virtual queue ;
or the number of tokens in a corresponding virtual queue ;
and (b) at time t , the weighted delay of each queue i is expressed by γ i / c i (t) times W i (t) , or by γ i / c i (t) times an increasing function of W i (t) , wherein γ i is a constant , and c i (t) is a weight coefficient that represents the transmission power required per unit data rate to transmit data , at time t , to the user whose destined queue is queue i .

EP1130872A1
CLAIM 6
The method of claim 4 , wherein each constant γ i is proportional to - c i lnδ i / T i , wherein c i represents a measured short-term average or short-term median value (output data set) of the weight coefficient c i (t) .

US8190610B2
CLAIM 41
. The computer system of claim 40 , wherein the reducing is accomplished by iterating on at least one key from each of the first and second intermediate data set so that the output data set (n value) is a merging of a portion of the first and second intermediate data set .
EP1130872A1
CLAIM 6
The method of claim 4 , wherein each constant γ i is proportional to - c i lnδ i / T i , wherein c i represents a measured short-term average or short-term median value (output data set) of the weight coefficient c i (t) .




US8190610B2

Filed: 2006-10-05     Issued: 2012-05-29

MapReduce for distributed database processing

(Original Assignee) Yahoo Inc     (Current Assignee) R2 Solutions LLC ; Altaba Inc

Ali Dasdan, Hung-Chih Yang, Ruey-Lung Hsiao
US6609123B1

Filed: 2000-09-01     Issued: 2003-08-19

Query engine and method for querying data using metadata model

(Original Assignee) Cognos Inc     (Current Assignee) International Business Machines Corp

Henk Cazemier, Glenn D. Rasmussen
US8190610B2
CLAIM 1
. A method of processing data of a data set over a distributed system , wherein the data set comprises a plurality of data groups , the method comprising : partitioning the data of each one of the data groups into a plurality of data partitions that each have a plurality of key-value pairs and providing each data partition to a selected one of a plurality of mapping functions that are each user-configurable to independently output a plurality of lists of values for each of a set of keys found in such map function' ;

s corresponding data partition to form corresponding intermediate data for that data group and identifiable to that data group , wherein the data of a first data group has a different schema (individual component) than the data of a second data group and the data of the first data group is mapped differently than the data of the second data group so that different lists of values are output for the corresponding different intermediate data (reference object) , wherein the different schema and corresponding different intermediate data have a key in common ;

and reducing the intermediate data for the data groups to at least one output data group , including processing the intermediate data for each data group in a manner that is defined to correspond to that data group , so as to result in a merging of the corresponding different intermediate data based on the key in common , wherein the mapping and reducing operations are performed by a distributed system .
US6609123B1
CLAIM 12
. The query engine as claimed in claim 11 , wherein the data matrix has an iterator to access an individual component (different schema) in the data matrix .

US6609123B1
CLAIM 35
. The query engine as claimed in claim 34 , wherein the application uses a subject item defined in the package layer to reference object (different intermediate data) s defined in the business layer as a basis to formulate multi-dimensional queries that are translatable to data source query .

US8190610B2
CLAIM 12
. The method of claim 5 , wherein : the intermediate data includes a plurality of grouped sets of key/value pairs ;

the reducing step is carried out by a plurality of reducers ;

for at least one of the reducers , the iterator corresponding to a particular data group , for that reducer , operates according to a different key of a different schema (individual component) than the iterator corresponding to another particular data group , for that reducer .
US6609123B1
CLAIM 12
. The query engine as claimed in claim 11 , wherein the data matrix has an iterator to access an individual component (different schema) in the data matrix .

US8190610B2
CLAIM 17
. A computer system including a plurality of computing devices , the computer system configured to process data of a data set , wherein the data set comprises a plurality of data groups , the computer system comprises at least one processor and memory that are operable to perform the following operations : partitioning the data of each one of the data groups into a plurality of data partitions that each have a plurality of key-value pairs and providing each data partition to a selected one of a plurality of mapping functions that are each user-configurable to independently output a plurality of lists of values for each of a set of keys found in such map function' ;

s corresponding data partition to form corresponding intermediate data for that data group and identifiable to that data group , wherein the data of a first data group has a different schema (individual component) than the data of a second data group and the data of the first data group is mapped differently than the data of the second data group so that different lists of values are output for the corresponding different intermediate data (reference object) , wherein the different schema and corresponding different intermediate data have a key in common ;

and reduce the intermediate data for the data groups to at least one output data group , including processing the intermediate data for each data group in a manner that is defined to correspond to that data group so as to result in a merging of the corresponding different intermediate data based on the key in common .
US6609123B1
CLAIM 12
. The query engine as claimed in claim 11 , wherein the data matrix has an iterator to access an individual component (different schema) in the data matrix .

US6609123B1
CLAIM 35
. The query engine as claimed in claim 34 , wherein the application uses a subject item defined in the package layer to reference object (different intermediate data) s defined in the business layer as a basis to formulate multi-dimensional queries that are translatable to data source query .

US8190610B2
CLAIM 22
. The computer system of claim 21 , wherein : the intermediate data includes a plurality of grouped sets of key/value pairs ;

the at least one processor and memory are further operable to reduce the intermediate data via a plurality of reducers ;

for at least one of the reducers , the iterator corresponding to a particular data group , for that reducer , is configured to operate according to a different key of a different schema (individual component) than the iterator corresponding to another particular data group , for that reducer .
US6609123B1
CLAIM 12
. The query engine as claimed in claim 11 , wherein the data matrix has an iterator to access an individual component (different schema) in the data matrix .

US8190610B2
CLAIM 33
. A map-reduce method of processing data from a plurality of groups having different schema (individual component) over a computer system , the method comprising : for a first data set having a plurality of first key-value pairs , wherein such first data set belongs to a first data group and the first key-value pairs have a first schema , partitioning the first data set into a plurality of data partitions and providing each data partition to a selected one of a plurality of mapping functions that are each user-configurable to independently output a plurality of lists of values for each of a set of keys found in such map function' ;

s corresponding data partition to form a first intermediate data set having a first set of resulting key-value pairs ;

for a second data set having a plurality of second key-value pairs , wherein such second data set belongs to a second data group and the second key-value pairs have a second schema , partitioning the first data set into a plurality of data partitions and providing each data partition to a selected one of a plurality of mapping functions to form a second intermediate data set having a second set of resulting key-value pairs differing from the first set of resulting key-value pairs , wherein the first schema differs from the second schema , wherein the first and second schema and the first and second set of resulting key-value pairs have a key in common ;

and reducing the first and the second intermediate data set together so as to form an output data set , wherein the reducing is accomplished by iterating on at least one key from the first and second intermediate data set , and the output data set has a different schema than the first and second schema , wherein the mapping and reducing operations are performed by a distributed system .
US6609123B1
CLAIM 12
. The query engine as claimed in claim 11 , wherein the data matrix has an iterator to access an individual component (different schema) in the data matrix .

US8190610B2
CLAIM 40
. A computer system including a plurality of computing devices , the computer system configured to process data of a data set , wherein the data set comprises a plurality of data groups , the computer system comprises at least one processor and memory that are operable to perform the following operations : for a first data set having a plurality of first key-value pairs , wherein such first data set belongs to a first data group and the first key-value pairs have a first schema , partitioning the first data set into a plurality of data partitions and providing each data partition to a selected one of a plurality of mapping functions that are each user-configurable to independently output a plurality of lists of values for each of a set of keys found in such map function' ;

s corresponding data partition to form a first intermediate data set having a first set of resulting key-value pairs ;

for a second data set having a plurality of second key-value pairs , wherein such second data set belongs to a second data group and the second key-value pairs have a second schema , partitioning the first data set into a plurality of data partitions and providing each data partition to a selected one of a plurality of mapping functions to form a second intermediate data set having a second set of resulting key-value pairs differing from the first set of resulting key-value pairs , wherein the first schema differs from the second schema ;

and reducing the first and the second intermediate data set together so as to form an output data set , wherein the reducing is accomplished by iterating on at least one key from the first and second intermediate data set , and the output data set has a different schema (individual component) than the first and second schema , wherein the first and second schema and the first and second set of resulting key-value pairs have a key in common .
US6609123B1
CLAIM 12
. The query engine as claimed in claim 11 , wherein the data matrix has an iterator to access an individual component (different schema) in the data matrix .




US8190610B2

Filed: 2006-10-05     Issued: 2012-05-29

MapReduce for distributed database processing

(Original Assignee) Yahoo Inc     (Current Assignee) R2 Solutions LLC ; Altaba Inc

Ali Dasdan, Hung-Chih Yang, Ruey-Lung Hsiao
EP1077413A2

Filed: 2000-08-08     Issued: 2001-02-21

Data access history indicating method and apparatus

(Original Assignee) Sony Corp     (Current Assignee) Sony Corp

Junichi c/o Sony Computer Science Lab. Rekimoto
US8190610B2
CLAIM 1
. A method of processing data of a data set over a distributed system , wherein the data set comprises a plurality of data groups , the method comprising : partitioning the data of each one of the data groups into a plurality of data partitions that each have a plurality of key-value pairs and providing each data partition to a selected one of a plurality of mapping functions that are each user-configurable to independently output a plurality of lists of values for each of a set of keys found in such map function' ;

s corresponding data partition to form corresponding intermediate data for that data group and identifiable to that data group , wherein the data of a first data group has a different schema than the data of a second data (said time) group and the data of the first data group is mapped differently than the data of the second data group so that different lists of values are output for the corresponding different intermediate data , wherein the different schema and corresponding different intermediate data have a key in common ;

and reducing the intermediate data for the data groups to at least one output data group , including processing the intermediate data for each data group in a manner that is defined to correspond to that data group , so as to result in a merging of the corresponding different intermediate data based on the key in common , wherein the mapping and reducing operations are performed by a distributed system .
EP1077413A2
CLAIM 2
An access-history indicating method according to Claim 1 , wherein the access history icon consists of a time base and a mark representing each record of access which is disposed at the corresponding position on said time (second data, second data group) base .

US8190610B2
CLAIM 14
. The method of claim 13 , wherein : the reducing step is carried out by a plurality of reducers ;

and the data that is not intermediate data , for a particular reducer , includes data (recording means) that is associated with another reducer .
EP1077413A2
CLAIM 24
A resource providing apparatus for providing a resource object including information for referring to another resource object , comprising : recording means (includes data) for sequentially recording a history of access to said resource object ;
generating means for generating a command for displaying an access history icon time-sequentially representing the history of access to said resource object ;
and adding means for adding the command generated by said generating means to said reference information .

US8190610B2
CLAIM 15
. The method of claim 13 , wherein : the reducing step is carried out by a plurality of reducers ;

and the data that is not intermediate data , for a particular reducer , includes data (recording means) that is associated with that reducer .
EP1077413A2
CLAIM 24
A resource providing apparatus for providing a resource object including information for referring to another resource object , comprising : recording means (includes data) for sequentially recording a history of access to said resource object ;
generating means for generating a command for displaying an access history icon time-sequentially representing the history of access to said resource object ;
and adding means for adding the command generated by said generating means to said reference information .

US8190610B2
CLAIM 17
. A computer system including a plurality of computing devices , the computer system configured to process data of a data set , wherein the data set comprises a plurality of data groups , the computer system comprises at least one processor and memory that are operable to perform the following operations : partitioning the data of each one of the data groups into a plurality of data partitions that each have a plurality of key-value pairs and providing each data partition to a selected one of a plurality of mapping functions that are each user-configurable to independently output a plurality of lists of values for each of a set of keys found in such map function' ;

s corresponding data partition to form corresponding intermediate data for that data group and identifiable to that data group , wherein the data of a first data group has a different schema than the data of a second data (said time) group and the data of the first data group is mapped differently than the data of the second data group so that different lists of values are output for the corresponding different intermediate data , wherein the different schema and corresponding different intermediate data have a key in common ;

and reduce the intermediate data for the data groups to at least one output data group , including processing the intermediate data for each data group in a manner that is defined to correspond to that data group so as to result in a merging of the corresponding different intermediate data based on the key in common .
EP1077413A2
CLAIM 2
An access-history indicating method according to Claim 1 , wherein the access history icon consists of a time base and a mark representing each record of access which is disposed at the corresponding position on said time (second data, second data group) base .

US8190610B2
CLAIM 30
. The computer system of claim 29 , wherein : the at least one processor and memory are further operable to reduce the intermediate data via a plurality of reducers ;

and the data that is not intermediate data , for a particular reducer , includes data (recording means) that is associated with another reducer .
EP1077413A2
CLAIM 24
A resource providing apparatus for providing a resource object including information for referring to another resource object , comprising : recording means (includes data) for sequentially recording a history of access to said resource object ;
generating means for generating a command for displaying an access history icon time-sequentially representing the history of access to said resource object ;
and adding means for adding the command generated by said generating means to said reference information .

US8190610B2
CLAIM 31
. The computer system of claim 29 , wherein : the at least one processor and memory are further operable to reduce the intermediate data via a plurality of reducers ;

and the data that is not intermediate data , for a particular reducer , includes data (recording means) that is associated with that reducer .
EP1077413A2
CLAIM 24
A resource providing apparatus for providing a resource object including information for referring to another resource object , comprising : recording means (includes data) for sequentially recording a history of access to said resource object ;
generating means for generating a command for displaying an access history icon time-sequentially representing the history of access to said resource object ;
and adding means for adding the command generated by said generating means to said reference information .

US8190610B2
CLAIM 33
. A map-reduce method of processing data from a plurality of groups having different schema over a computer system , the method comprising : for a first data set having a plurality of first key-value pairs , wherein such first data set belongs to a first data group and the first key-value pairs have a first schema , partitioning the first data set into a plurality of data partitions and providing each data partition to a selected one of a plurality of mapping functions that are each user-configurable to independently output a plurality of lists of values for each of a set of keys found in such map function' ;

s corresponding data partition to form a first intermediate data set having a first set of resulting key-value pairs ;

for a second data (said time) set having a plurality of second key-value pairs , wherein such second data set belongs to a second data group and the second key-value pairs have a second schema , partitioning the first data set into a plurality of data partitions and providing each data partition to a selected one of a plurality of mapping functions to form a second intermediate data set having a second set of resulting key-value pairs differing from the first set of resulting key-value pairs , wherein the first schema differs from the second schema , wherein the first and second schema and the first and second set of resulting key-value pairs have a key in common ;

and reducing the first and the second intermediate data set together so as to form an output data set , wherein the reducing is accomplished by iterating on at least one key from the first and second intermediate data set , and the output data set has a different schema than the first and second schema , wherein the mapping and reducing operations are performed by a distributed system .
EP1077413A2
CLAIM 2
An access-history indicating method according to Claim 1 , wherein the access history icon consists of a time base and a mark representing each record of access which is disposed at the corresponding position on said time (second data, second data group) base .

US8190610B2
CLAIM 40
. A computer system including a plurality of computing devices , the computer system configured to process data of a data set , wherein the data set comprises a plurality of data groups , the computer system comprises at least one processor and memory that are operable to perform the following operations : for a first data set having a plurality of first key-value pairs , wherein such first data set belongs to a first data group and the first key-value pairs have a first schema , partitioning the first data set into a plurality of data partitions and providing each data partition to a selected one of a plurality of mapping functions that are each user-configurable to independently output a plurality of lists of values for each of a set of keys found in such map function' ;

s corresponding data partition to form a first intermediate data set having a first set of resulting key-value pairs ;

for a second data (said time) set having a plurality of second key-value pairs , wherein such second data set belongs to a second data group and the second key-value pairs have a second schema , partitioning the first data set into a plurality of data partitions and providing each data partition to a selected one of a plurality of mapping functions to form a second intermediate data set having a second set of resulting key-value pairs differing from the first set of resulting key-value pairs , wherein the first schema differs from the second schema ;

and reducing the first and the second intermediate data set together so as to form an output data set , wherein the reducing is accomplished by iterating on at least one key from the first and second intermediate data set , and the output data set has a different schema than the first and second schema , wherein the first and second schema and the first and second set of resulting key-value pairs have a key in common .
EP1077413A2
CLAIM 2
An access-history indicating method according to Claim 1 , wherein the access history icon consists of a time base and a mark representing each record of access which is disposed at the corresponding position on said time (second data, second data group) base .




US8190610B2

Filed: 2006-10-05     Issued: 2012-05-29

MapReduce for distributed database processing

(Original Assignee) Yahoo Inc     (Current Assignee) R2 Solutions LLC ; Altaba Inc

Ali Dasdan, Hung-Chih Yang, Ruey-Lung Hsiao
JP2002024192A

Filed: 2000-07-07     Issued: 2002-01-25

計算機資源分割装置および資源分割方法

(Original Assignee) Hitachi Ltd; 株式会社日立製作所     

Yoshiko Tamaoki, 由子 玉置, Toru Shonai, 亨 庄内, Nobutoshi Sagawa, 暢俊 佐川, Takashi Kawabe, 峻 河辺
US8190610B2
CLAIM 33
. A map-reduce method of processing data from a plurality of groups having different schema over a computer system , the method comprising : for a first data set having a plurality of first key-value pairs , wherein such first data set belongs to a first data group and the first key-value pairs have a first schema , partitioning the first data set into a plurality of data partitions and providing each data partition to a selected one of a plurality of mapping functions that are each user-configurable to independently output a plurality of lists of values for each of a set of keys found in such map function' ;

s corresponding data partition to form a first intermediate data set having a first set of resulting key-value pairs ;

for a second data set having a plurality of second key-value pairs , wherein such second data set belongs to a second data group and the second key-value pairs have a second schema , partitioning the first data set into a plurality of data partitions and providing each data partition to a selected one of a plurality of mapping functions to form a second intermediate data (ファイル) set having a second set of resulting key-value pairs differing from the first set of resulting key-value pairs , wherein the first schema differs from the second schema , wherein the first and second schema and the first and second set of resulting key-value pairs have a key in common ;

and reducing the first and the second intermediate data set together so as to form an output data set , wherein the reducing is accomplished by iterating on at least one key from the first and second intermediate data set , and the output data set has a different schema than the first and second schema , wherein the mapping and reducing operations are performed by a distributed system .
JP2002024192A
CLAIM 15
【請求項15】相互にネットワークで結合される複数の 計算機を有し、各計算機には標準でアクセスするルート ファイル (second intermediate data) が設定されており、複数ユーザの要求を処理す る計算システムにおいて、各ユーザごとに計算機の割当 てを自動的に変更する計算資源分割方法であって、 上記計算機資源の稼動状況を受け取る手順と、 上記稼動状況とユーザごとのサービスレベルを比較する 手順と、 上記比較に基づきユーザごとの計算機割当てを変更すべ きか判断する手順と、 ユーザごとの計算機割当て表を変更する手順と、 計算機ごとのルートファイル名を変更する指示を出す手 順を有することを特徴とする計算資源分割方法。

US8190610B2
CLAIM 34
. The map-reduce method of claim 33 , wherein the reducing is accomplished by iterating on at least one key from each of the first and second intermediate data (ファイル) set so that the output data set is a merging of a portion of the first and second intermediate data set .
JP2002024192A
CLAIM 15
【請求項15】相互にネットワークで結合される複数の 計算機を有し、各計算機には標準でアクセスするルート ファイル (second intermediate data) が設定されており、複数ユーザの要求を処理す る計算システムにおいて、各ユーザごとに計算機の割当 てを自動的に変更する計算資源分割方法であって、 上記計算機資源の稼動状況を受け取る手順と、 上記稼動状況とユーザごとのサービスレベルを比較する 手順と、 上記比較に基づきユーザごとの計算機割当てを変更すべ きか判断する手順と、 ユーザごとの計算機割当て表を変更する手順と、 計算機ごとのルートファイル名を変更する指示を出す手 順を有することを特徴とする計算資源分割方法。

US8190610B2
CLAIM 35
. The map-reduce method of claim 33 , wherein the reducing is accomplished by a plurality of reducers , the method further comprises : partitioning the first or second intermediate data (ファイル) set into a plurality of partitions , at least some of the key-value pairs of the first or second intermediate data being provided to more than one of the partitions ;

and providing the partitioned intermediate data set of each partition to a separate one of the reducers .
JP2002024192A
CLAIM 15
【請求項15】相互にネットワークで結合される複数の 計算機を有し、各計算機には標準でアクセスするルート ファイル (second intermediate data) が設定されており、複数ユーザの要求を処理す る計算システムにおいて、各ユーザごとに計算機の割当 てを自動的に変更する計算資源分割方法であって、 上記計算機資源の稼動状況を受け取る手順と、 上記稼動状況とユーザごとのサービスレベルを比較する 手順と、 上記比較に基づきユーザごとの計算機割当てを変更すべ きか判断する手順と、 ユーザごとの計算機割当て表を変更する手順と、 計算機ごとのルートファイル名を変更する指示を出す手 順を有することを特徴とする計算資源分割方法。

US8190610B2
CLAIM 36
. The map-reduce method of claim 35 , wherein all of the key-value pairs of the first and second intermediate data (ファイル) set are provided to all of the reducers .
JP2002024192A
CLAIM 15
【請求項15】相互にネットワークで結合される複数の 計算機を有し、各計算機には標準でアクセスするルート ファイル (second intermediate data) が設定されており、複数ユーザの要求を処理す る計算システムにおいて、各ユーザごとに計算機の割当 てを自動的に変更する計算資源分割方法であって、 上記計算機資源の稼動状況を受け取る手順と、 上記稼動状況とユーザごとのサービスレベルを比較する 手順と、 上記比較に基づきユーザごとの計算機割当てを変更すべ きか判断する手順と、 ユーザごとの計算機割当て表を変更する手順と、 計算機ごとのルートファイル名を変更する指示を出す手 順を有することを特徴とする計算資源分割方法。

US8190610B2
CLAIM 40
. A computer system including a plurality of computing devices , the computer system configured to process data of a data set , wherein the data set comprises a plurality of data groups , the computer system comprises at least one processor and memory that are operable to perform the following operations : for a first data set having a plurality of first key-value pairs , wherein such first data set belongs to a first data group and the first key-value pairs have a first schema , partitioning the first data set into a plurality of data partitions and providing each data partition to a selected one of a plurality of mapping functions that are each user-configurable to independently output a plurality of lists of values for each of a set of keys found in such map function' ;

s corresponding data partition to form a first intermediate data set having a first set of resulting key-value pairs ;

for a second data set having a plurality of second key-value pairs , wherein such second data set belongs to a second data group and the second key-value pairs have a second schema , partitioning the first data set into a plurality of data partitions and providing each data partition to a selected one of a plurality of mapping functions to form a second intermediate data (ファイル) set having a second set of resulting key-value pairs differing from the first set of resulting key-value pairs , wherein the first schema differs from the second schema ;

and reducing the first and the second intermediate data set together so as to form an output data set , wherein the reducing is accomplished by iterating on at least one key from the first and second intermediate data set , and the output data set has a different schema than the first and second schema , wherein the first and second schema and the first and second set of resulting key-value pairs have a key in common .
JP2002024192A
CLAIM 15
【請求項15】相互にネットワークで結合される複数の 計算機を有し、各計算機には標準でアクセスするルート ファイル (second intermediate data) が設定されており、複数ユーザの要求を処理す る計算システムにおいて、各ユーザごとに計算機の割当 てを自動的に変更する計算資源分割方法であって、 上記計算機資源の稼動状況を受け取る手順と、 上記稼動状況とユーザごとのサービスレベルを比較する 手順と、 上記比較に基づきユーザごとの計算機割当てを変更すべ きか判断する手順と、 ユーザごとの計算機割当て表を変更する手順と、 計算機ごとのルートファイル名を変更する指示を出す手 順を有することを特徴とする計算資源分割方法。

US8190610B2
CLAIM 41
. The computer system of claim 40 , wherein the reducing is accomplished by iterating on at least one key from each of the first and second intermediate data (ファイル) set so that the output data set is a merging of a portion of the first and second intermediate data set .
JP2002024192A
CLAIM 15
【請求項15】相互にネットワークで結合される複数の 計算機を有し、各計算機には標準でアクセスするルート ファイル (second intermediate data) が設定されており、複数ユーザの要求を処理す る計算システムにおいて、各ユーザごとに計算機の割当 てを自動的に変更する計算資源分割方法であって、 上記計算機資源の稼動状況を受け取る手順と、 上記稼動状況とユーザごとのサービスレベルを比較する 手順と、 上記比較に基づきユーザごとの計算機割当てを変更すべ きか判断する手順と、 ユーザごとの計算機割当て表を変更する手順と、 計算機ごとのルートファイル名を変更する指示を出す手 順を有することを特徴とする計算資源分割方法。

US8190610B2
CLAIM 42
. The computer system of claim 40 , wherein the reducing is accomplished by a plurality of reducers , the d the at least one processor and memory further operable for : partitioning the first or second intermediate data (ファイル) set into a plurality of partitions , at least some of the key-value pairs of the first or second intermediate data being provided to more than one of the partitions ;

and providing the partitioned intermediate data set of each partition to a separate one of the reducers .
JP2002024192A
CLAIM 15
【請求項15】相互にネットワークで結合される複数の 計算機を有し、各計算機には標準でアクセスするルート ファイル (second intermediate data) が設定されており、複数ユーザの要求を処理す る計算システムにおいて、各ユーザごとに計算機の割当 てを自動的に変更する計算資源分割方法であって、 上記計算機資源の稼動状況を受け取る手順と、 上記稼動状況とユーザごとのサービスレベルを比較する 手順と、 上記比較に基づきユーザごとの計算機割当てを変更すべ きか判断する手順と、 ユーザごとの計算機割当て表を変更する手順と、 計算機ごとのルートファイル名を変更する指示を出す手 順を有することを特徴とする計算資源分割方法。

US8190610B2
CLAIM 43
. The computer system of claim 42 , wherein all of the key-value pairs of the first and second intermediate data (ファイル) set are provided to all of the reducers .
JP2002024192A
CLAIM 15
【請求項15】相互にネットワークで結合される複数の 計算機を有し、各計算機には標準でアクセスするルート ファイル (second intermediate data) が設定されており、複数ユーザの要求を処理す る計算システムにおいて、各ユーザごとに計算機の割当 てを自動的に変更する計算資源分割方法であって、 上記計算機資源の稼動状況を受け取る手順と、 上記稼動状況とユーザごとのサービスレベルを比較する 手順と、 上記比較に基づきユーザごとの計算機割当てを変更すべ きか判断する手順と、 ユーザごとの計算機割当て表を変更する手順と、 計算機ごとのルートファイル名を変更する指示を出す手 順を有することを特徴とする計算資源分割方法。




US8190610B2

Filed: 2006-10-05     Issued: 2012-05-29

MapReduce for distributed database processing

(Original Assignee) Yahoo Inc     (Current Assignee) R2 Solutions LLC ; Altaba Inc

Ali Dasdan, Hung-Chih Yang, Ruey-Lung Hsiao
US6721749B1

Filed: 2000-07-06     Issued: 2004-04-13

Populating a data warehouse using a pipeline approach

(Original Assignee) Microsoft Corp     (Current Assignee) Microsoft Technology Licensing LLC

Tarek Najm, Ramesh Manne, Savithri N. Dani, Karl D. Johnson, Degelhan Truesaw, Daniel P. Boerner
US8190610B2
CLAIM 1
. A method of processing data of a data set over a distributed system , wherein the data set comprises a plurality of data groups , the method comprising : partitioning the data of each one of the data groups into a plurality of data partitions that each have a plurality of key-value pairs and providing each data partition to a selected one of a plurality of mapping functions that are each user-configurable to independently output a plurality of lists of values for each of a set of keys found in such map function' ;

s corresponding data partition to form corresponding intermediate data for that data group and identifiable to that data group , wherein the data of a first data group has a different schema than the data of a second data group and the data of the first data group is mapped differently than the data of the second data group so that different lists (data processing program) of values are output for the corresponding different intermediate data (further process) , wherein the different schema and corresponding different intermediate data have a key in common ;

and reducing the intermediate data for the data groups to at least one output data group , including processing the intermediate data for each data group in a manner that is defined to correspond to that data group , so as to result in a merging of the corresponding different intermediate data based on the key in common , wherein the mapping and reducing operations are performed by a distributed system .
US6721749B1
CLAIM 32
. A method of populating a data warehouse with logged data from a plurality of servers , comprising : periodically providing a data processing program (different lists) to the plurality of logging servers for execution by each of the servers to pre-process data on that server ;
executing the data processing program on each of the plurality of servers to produce pre-processed data ;
providing the pre-processed data to a central collection facility ;
further process (different intermediate data) ing the data at the central collection facility ;
and loading the further processed data into the data warehouse .

US8190610B2
CLAIM 14
. The method of claim 13 , wherein : the reducing step is carried out by a plurality of reducers ;

and the data that is not intermediate data , for a particular reducer (more log) , includes data that is associated with another reducer .
US6721749B1
CLAIM 10
. A system as recited in claim 1 , wherein : the data collection and warehousing system periodically provides a pre-processor component to the plurality of servers for execution by each of the servers to pre-process one or more log (particular reducer) files on that server ;
the data collection and warehousing system receives the pre-processed log files from the individual servers after they have executed the pre-processor component .

US8190610B2
CLAIM 15
. The method of claim 13 , wherein : the reducing step is carried out by a plurality of reducers ;

and the data that is not intermediate data , for a particular reducer (more log) , includes data that is associated with that reducer .
US6721749B1
CLAIM 10
. A system as recited in claim 1 , wherein : the data collection and warehousing system periodically provides a pre-processor component to the plurality of servers for execution by each of the servers to pre-process one or more log (particular reducer) files on that server ;
the data collection and warehousing system receives the pre-processed log files from the individual servers after they have executed the pre-processor component .

US8190610B2
CLAIM 17
. A computer system including a plurality of computing devices , the computer system configured to process data of a data set , wherein the data set comprises a plurality of data groups , the computer system comprises at least one processor and memory that are operable to perform the following operations : partitioning the data of each one of the data groups into a plurality of data partitions that each have a plurality of key-value pairs and providing each data partition to a selected one of a plurality of mapping functions that are each user-configurable to independently output a plurality of lists of values for each of a set of keys found in such map function' ;

s corresponding data partition to form corresponding intermediate data for that data group and identifiable to that data group , wherein the data of a first data group has a different schema than the data of a second data group and the data of the first data group is mapped differently than the data of the second data group so that different lists (data processing program) of values are output for the corresponding different intermediate data (further process) , wherein the different schema and corresponding different intermediate data have a key in common ;

and reduce the intermediate data for the data groups to at least one output data group , including processing the intermediate data for each data group in a manner that is defined to correspond to that data group so as to result in a merging of the corresponding different intermediate data based on the key in common .
US6721749B1
CLAIM 32
. A method of populating a data warehouse with logged data from a plurality of servers , comprising : periodically providing a data processing program (different lists) to the plurality of logging servers for execution by each of the servers to pre-process data on that server ;
executing the data processing program on each of the plurality of servers to produce pre-processed data ;
providing the pre-processed data to a central collection facility ;
further process (different intermediate data) ing the data at the central collection facility ;
and loading the further processed data into the data warehouse .

US8190610B2
CLAIM 30
. The computer system of claim 29 , wherein : the at least one processor and memory are further operable to reduce the intermediate data via a plurality of reducers ;

and the data that is not intermediate data , for a particular reducer (more log) , includes data that is associated with another reducer .
US6721749B1
CLAIM 10
. A system as recited in claim 1 , wherein : the data collection and warehousing system periodically provides a pre-processor component to the plurality of servers for execution by each of the servers to pre-process one or more log (particular reducer) files on that server ;
the data collection and warehousing system receives the pre-processed log files from the individual servers after they have executed the pre-processor component .

US8190610B2
CLAIM 31
. The computer system of claim 29 , wherein : the at least one processor and memory are further operable to reduce the intermediate data via a plurality of reducers ;

and the data that is not intermediate data , for a particular reducer (more log) , includes data that is associated with that reducer .
US6721749B1
CLAIM 10
. A system as recited in claim 1 , wherein : the data collection and warehousing system periodically provides a pre-processor component to the plurality of servers for execution by each of the servers to pre-process one or more log (particular reducer) files on that server ;
the data collection and warehousing system receives the pre-processed log files from the individual servers after they have executed the pre-processor component .




US8190610B2

Filed: 2006-10-05     Issued: 2012-05-29

MapReduce for distributed database processing

(Original Assignee) Yahoo Inc     (Current Assignee) R2 Solutions LLC ; Altaba Inc

Ali Dasdan, Hung-Chih Yang, Ruey-Lung Hsiao
US6381611B1

Filed: 2000-05-31     Issued: 2002-04-30

Method and system for navigation and data entry in hierarchically-organized database views

(Original Assignee) Cyberpulse LLC     (Current Assignee) Ascend Hit LLC

James Roberge, Jeffrey Soble
US8190610B2
CLAIM 17
. A computer system (computer system) including a plurality of computing devices , the computer system configured to process data of a data set , wherein the data set comprises a plurality of data groups , the computer system comprises at least one processor and memory that are operable to perform the following operations : partitioning the data of each one of the data groups into a plurality of data partitions that each have a plurality of key-value pairs and providing each data partition to a selected one of a plurality of mapping functions that are each user-configurable to independently output a plurality of lists of values for each of a set of keys found in such map function' ;

s corresponding data partition to form corresponding intermediate data for that data group and identifiable to that data group , wherein the data of a first data group has a different schema than the data of a second data group and the data of the first data group is mapped differently than the data of the second data group so that different lists of values are output for the corresponding different intermediate data , wherein the different schema and corresponding different intermediate data have a key in common ;

and reduce the intermediate data for the data groups to at least one output data group , including processing the intermediate data for each data group in a manner that is defined to correspond to that data group so as to result in a merging of the corresponding different intermediate data based on the key in common .
US6381611B1
CLAIM 61
. The method of claim 52 , wherein said method is executed on a computer system (computer system) having limited screen space available for display .

US8190610B2
CLAIM 18
. The computer system (computer system) of claim 17 , wherein : the at least one output data group is a plurality of output data groups .
US6381611B1
CLAIM 61
. The method of claim 52 , wherein said method is executed on a computer system (computer system) having limited screen space available for display .

US8190610B2
CLAIM 19
. The computer system (computer system) of claim 17 , wherein : corresponding intermediate data for a data group being identifiable to the data group includes the corresponding intermediate data being stored such that metadata associated with the manner of storage is identifiable to the data group .
US6381611B1
CLAIM 61
. The method of claim 52 , wherein said method is executed on a computer system (computer system) having limited screen space available for display .

US8190610B2
CLAIM 20
. The computer system (computer system) of claim 19 , wherein : corresponding intermediate data for a data group being identifiable to the data group includes the computer system being configured to store the corresponding intermediate data in association with data that is identifiable to the data group .
US6381611B1
CLAIM 61
. The method of claim 52 , wherein said method is executed on a computer system (computer system) having limited screen space available for display .

US8190610B2
CLAIM 21
. The computer system (computer system) of claim 17 , wherein : the at least one processor and memory being configured to process the intermediate data for each data group in a manner that is defined to correspond to that data group includes , for each data group , the at least one processor and memory being configured to employ an iterator that corresponds to that data group .
US6381611B1
CLAIM 61
. The method of claim 52 , wherein said method is executed on a computer system (computer system) having limited screen space available for display .

US8190610B2
CLAIM 22
. The computer system (computer system) of claim 21 , wherein : the intermediate data includes a plurality of grouped sets of key/value pairs ;

the at least one processor and memory are further operable to reduce the intermediate data via a plurality of reducers ;

for at least one of the reducers , the iterator corresponding to a particular data group , for that reducer , is configured to operate according to a different key of a different schema than the iterator corresponding to another particular data group , for that reducer .
US6381611B1
CLAIM 61
. The method of claim 52 , wherein said method is executed on a computer system (computer system) having limited screen space available for display .

US8190610B2
CLAIM 23
. The computer system (computer system) of claim 17 , wherein : the intermediate data includes a plurality of grouped sets of key/value pairs ;

the at least one processor and memory are further operable to reduce the intermediate data via a plurality of reducers ;

and the at least one processor and memory are further operable to : partition the intermediate data into a plurality of partitions , to provide each key/value pair of the intermediate data to a separate one of the partitions ;

and provide the intermediate data of each partition to a separate one of the reducers .
US6381611B1
CLAIM 61
. The method of claim 52 , wherein said method is executed on a computer system (computer system) having limited screen space available for display .

US8190610B2
CLAIM 24
. The computer system (computer system) of claim 17 , wherein : the intermediate data includes a plurality of grouped sets of key/value pairs ;

the at least one processor and memory are further operable to reduce the intermediate data via a plurality of reducers ;

and the at least one processor and memory are further operable to : partition the intermediate data into a plurality of partitions , to provide at least some of the key/value pairs of the intermediate data to more than one of the partitions ;

and provide the intermediate data of each partition to a separate one of the reducers .
US6381611B1
CLAIM 61
. The method of claim 52 , wherein said method is executed on a computer system (computer system) having limited screen space available for display .

US8190610B2
CLAIM 25
. The computer system (computer system) of claim 24 , wherein : the at least one processor and memory are further operable to , in partitioning , to provide all of the key/value pairs of the intermediate data to all of the partitions .
US6381611B1
CLAIM 61
. The method of claim 52 , wherein said method is executed on a computer system (computer system) having limited screen space available for display .

US8190610B2
CLAIM 26
. The computer system (computer system) of claim 24 , wherein : at least some of the reducers include a sort , group-by-key and combine task ;

the at least one processor and memory are further operable to generate and provide metadata for at least some of the mapping , partitioning , combining , grouping and sorting .
US6381611B1
CLAIM 61
. The method of claim 52 , wherein said method is executed on a computer system (computer system) having limited screen space available for display .

US8190610B2
CLAIM 27
. The computer system (computer system) of claim 26 , wherein : the reducing includes processing the metadata .
US6381611B1
CLAIM 61
. The method of claim 52 , wherein said method is executed on a computer system (computer system) having limited screen space available for display .

US8190610B2
CLAIM 28
. The computer system (computer system) of claim 27 , wherein : the at least one processor and memory being further operable to process the intermediate data for each data group in a manner that is defined to correspond to that data group includes , for each data group , the at least one processor and memory being further operable to employ an iterator that corresponds to that data group , wherein the iterator is configured to provide the associated metadata to the reducing .
US6381611B1
CLAIM 61
. The method of claim 52 , wherein said method is executed on a computer system (computer system) having limited screen space available for display .

US8190610B2
CLAIM 29
. The computer system (computer system) of claim 17 , wherein : the intermediate data processing of the reducing further comprises processing data that is not intermediate data .
US6381611B1
CLAIM 61
. The method of claim 52 , wherein said method is executed on a computer system (computer system) having limited screen space available for display .

US8190610B2
CLAIM 30
. The computer system (computer system) of claim 29 , wherein : the at least one processor and memory are further operable to reduce the intermediate data via a plurality of reducers ;

and the data that is not intermediate data , for a particular reducer , includes data that is associated with another reducer .
US6381611B1
CLAIM 61
. The method of claim 52 , wherein said method is executed on a computer system (computer system) having limited screen space available for display .

US8190610B2
CLAIM 31
. The computer system (computer system) of claim 29 , wherein : the at least one processor and memory are further operable to reduce the intermediate data via a plurality of reducers ;

and the data that is not intermediate data , for a particular reducer , includes data that is associated with that reducer .
US6381611B1
CLAIM 61
. The method of claim 52 , wherein said method is executed on a computer system (computer system) having limited screen space available for display .

US8190610B2
CLAIM 32
. The computer system (computer system) of claim 17 , wherein the at least one processor and memory are further operable to reduce by relating the data among the plurality of data groups .
US6381611B1
CLAIM 61
. The method of claim 52 , wherein said method is executed on a computer system (computer system) having limited screen space available for display .

US8190610B2
CLAIM 33
. A map-reduce method of processing data from a plurality of groups having different schema over a computer system (computer system) , the method comprising : for a first data set having a plurality of first key-value pairs , wherein such first data set belongs to a first data group and the first key-value pairs have a first schema , partitioning the first data set into a plurality of data partitions and providing each data partition to a selected one of a plurality of mapping functions that are each user-configurable to independently output a plurality of lists of values for each of a set of keys found in such map function' ;

s corresponding data partition to form a first intermediate data set having a first set (first one) of resulting key-value pairs ;

for a second data set having a plurality of second key-value pairs , wherein such second data set belongs to a second data group and the second key-value pairs have a second schema , partitioning the first data set into a plurality of data partitions and providing each data partition to a selected one of a plurality of mapping functions to form a second intermediate data set having a second set (first one) of resulting key-value pairs differing from the first set of resulting key-value pairs , wherein the first schema differs from the second schema , wherein the first and second schema and the first and second set of resulting key-value pairs have a key in common ;

and reducing the first and the second intermediate data set together so as to form an output data set , wherein the reducing is accomplished by iterating on at least one key from the first and second intermediate data set , and the output data set has a different schema than the first and second schema , wherein the mapping and reducing operations are performed by a distributed system .
US6381611B1
CLAIM 41
. A method for navigating a hierarchically-organized database having a root node comprising : (a) displaying the root node ;
(b) displaying all child nodes of the root node as selectable buttons ;
(c) selecting a first one (first set, second set) of the child nodes of the root node to start navigating the data base , thus prompting the display of the root node , the first one of the child nodes of the root node , and all child nodes of the first one of the child nodes as selectable buttons ;
(d) moving down the database by selecting one of the children buttons of the selected button , completely erasing from display all other unselected children buttons of the selected buttons , but retaining the display of selected ancestor buttons ;
and (e) navigating the database by selecting selectable buttons representing nodes of the database until leaf buttons are displayed .

US6381611B1
CLAIM 61
. The method of claim 52 , wherein said method is executed on a computer system (computer system) having limited screen space available for display .

US8190610B2
CLAIM 40
. A computer system (computer system) including a plurality of computing devices , the computer system configured to process data of a data set , wherein the data set comprises a plurality of data groups , the computer system comprises at least one processor and memory that are operable to perform the following operations : for a first data set having a plurality of first key-value pairs , wherein such first data set belongs to a first data group and the first key-value pairs have a first schema , partitioning the first data set into a plurality of data partitions and providing each data partition to a selected one of a plurality of mapping functions that are each user-configurable to independently output a plurality of lists of values for each of a set of keys found in such map function' ;

s corresponding data partition to form a first intermediate data set having a first set (first one) of resulting key-value pairs ;

for a second data set having a plurality of second key-value pairs , wherein such second data set belongs to a second data group and the second key-value pairs have a second schema , partitioning the first data set into a plurality of data partitions and providing each data partition to a selected one of a plurality of mapping functions to form a second intermediate data set having a second set (first one) of resulting key-value pairs differing from the first set of resulting key-value pairs , wherein the first schema differs from the second schema ;

and reducing the first and the second intermediate data set together so as to form an output data set , wherein the reducing is accomplished by iterating on at least one key from the first and second intermediate data set , and the output data set has a different schema than the first and second schema , wherein the first and second schema and the first and second set of resulting key-value pairs have a key in common .
US6381611B1
CLAIM 41
. A method for navigating a hierarchically-organized database having a root node comprising : (a) displaying the root node ;
(b) displaying all child nodes of the root node as selectable buttons ;
(c) selecting a first one (first set, second set) of the child nodes of the root node to start navigating the data base , thus prompting the display of the root node , the first one of the child nodes of the root node , and all child nodes of the first one of the child nodes as selectable buttons ;
(d) moving down the database by selecting one of the children buttons of the selected button , completely erasing from display all other unselected children buttons of the selected buttons , but retaining the display of selected ancestor buttons ;
and (e) navigating the database by selecting selectable buttons representing nodes of the database until leaf buttons are displayed .

US6381611B1
CLAIM 61
. The method of claim 52 , wherein said method is executed on a computer system (computer system) having limited screen space available for display .

US8190610B2
CLAIM 41
. The computer system (computer system) of claim 40 , wherein the reducing is accomplished by iterating on at least one key from each of the first and second intermediate data set so that the output data set is a merging of a portion of the first and second intermediate data set .
US6381611B1
CLAIM 61
. The method of claim 52 , wherein said method is executed on a computer system (computer system) having limited screen space available for display .

US8190610B2
CLAIM 42
. The computer system (computer system) of claim 40 , wherein the reducing is accomplished by a plurality of reducers , the d the at least one processor and memory further operable for : partitioning the first or second intermediate data set into a plurality of partitions , at least some of the key-value pairs of the first or second intermediate data being provided to more than one of the partitions ;

and providing the partitioned intermediate data set of each partition to a separate one of the reducers .
US6381611B1
CLAIM 61
. The method of claim 52 , wherein said method is executed on a computer system (computer system) having limited screen space available for display .

US8190610B2
CLAIM 43
. The computer system (computer system) of claim 42 , wherein all of the key-value pairs of the first and second intermediate data set are provided to all of the reducers .
US6381611B1
CLAIM 61
. The method of claim 52 , wherein said method is executed on a computer system (computer system) having limited screen space available for display .

US8190610B2
CLAIM 44
. The computer system (computer system) of claim 42 , wherein at least some of the reducers include a sort , group-by-key and combine task , the d the at least one processor and memory are further operable for : generating and providing metadata for at least some of the mapping , partitioning , combining , grouping and sorting .
US6381611B1
CLAIM 61
. The method of claim 52 , wherein said method is executed on a computer system (computer system) having limited screen space available for display .

US8190610B2
CLAIM 45
. The computer system (computer system) of claim 44 , wherein the reducing includes processing the metadata .
US6381611B1
CLAIM 61
. The method of claim 52 , wherein said method is executed on a computer system (computer system) having limited screen space available for display .

US8190610B2
CLAIM 46
. The computer system (computer system) of claim 45 , wherein iterating includes providing the associated metadata to the processing of the reducing step .
US6381611B1
CLAIM 61
. The method of claim 52 , wherein said method is executed on a computer system (computer system) having limited screen space available for display .




US8190610B2

Filed: 2006-10-05     Issued: 2012-05-29

MapReduce for distributed database processing

(Original Assignee) Yahoo Inc     (Current Assignee) R2 Solutions LLC ; Altaba Inc

Ali Dasdan, Hung-Chih Yang, Ruey-Lung Hsiao
JP2001331332A

Filed: 2000-05-22     Issued: 2001-11-30

アプリケーションシステムのリソース予約方法、予約装置、リソース量推定装置およびコンピュータシステム

(Original Assignee) Nippon Telegr & Teleph Corp <Ntt>; 日本電信電話株式会社     

Fumio Kajiwara, 史雄 梶原
US8190610B2
CLAIM 1
. A method of processing data (コンピュ) of a data set over a distributed system , wherein the data set comprises a plurality of data groups , the method comprising : partitioning the data of each one of the data groups into a plurality of data partitions that each have a plurality of key-value pairs and providing each data partition to a selected one of a plurality of mapping functions that are each user-configurable to independently output a plurality of lists of values for each of a set of keys found in such map function' ;

s corresponding data partition to form corresponding intermediate data for that data group and identifiable to that data group , wherein the data of a first data group has a different schema than the data of a second data group and the data of the first data group is mapped differently than the data of the second data group so that different lists of values are output for the corresponding different intermediate data , wherein the different schema and corresponding different intermediate data have a key in common ;

and reducing the intermediate data for the data groups to at least one output data group , including processing the intermediate data for each data group in a manner that is defined to correspond to that data group , so as to result in a merging of the corresponding different intermediate data based on the key in common , wherein the mapping and reducing operations are performed by a distributed system .
JP2001331332A
CLAIM 1
【請求項1】 リソース割り当て単位であるタスク毎に リソースを利用して、処理単位であるスレッドを動作さ せ、該タスクに対するハードウェアリソースまたはソフ トウェアリソースの利用要求に対して動的にリソース量 を割り当て、かつ該タスクのリソース量の利用状況を取 得するリソース割り当て手段と、 リソース各々のリソース量を該タスクに対して予約・予 約変更・予約停止をそれぞれ行い、かつ前記リソース割 り当て手段と連携して予約が行われているタスクに対す る動的なリソース利用要求に対して予約されているリソ ース量を超えないようにリソース割り当てを行わせるリ ソース予約手段と、 各リソース予約手段に必要とされるリソース量を当該ア プリケーションシステムのサービスレベルとリソース量 との対応を蓄積したサービスレベル・リソース量対応デ ータベース手段とを備えたことを特徴とするコンピュ (processing data) ー タシステム。

US8190610B2
CLAIM 9
. The method of claim 7 , wherein : at least some of the reducers include a sort , group-by-key and combine task (前記リソース) ;

the method further comprises generating and providing metadata for at least some of the mapping , partitioning , combining , grouping and sorting .
JP2001331332A
CLAIM 1
【請求項1】 リソース割り当て単位であるタスク毎に リソースを利用して、処理単位であるスレッドを動作さ せ、該タスクに対するハードウェアリソースまたはソフ トウェアリソースの利用要求に対して動的にリソース量 を割り当て、かつ該タスクのリソース量の利用状況を取 得するリソース割り当て手段と、 リソース各々のリソース量を該タスクに対して予約・予 約変更・予約停止をそれぞれ行い、かつ前記リソース (combine task) 割 り当て手段と連携して予約が行われているタスクに対す る動的なリソース利用要求に対して予約されているリソ ース量を超えないようにリソース割り当てを行わせるリ ソース予約手段と、 各リソース予約手段に必要とされるリソース量を当該ア プリケーションシステムのサービスレベルとリソース量 との対応を蓄積したサービスレベル・リソース量対応デ ータベース手段とを備えたことを特徴とするコンピュー タシステム。

US8190610B2
CLAIM 13
. The method of claim 1 , wherein : the intermediate data processing step of the reducing step further comprises processing data (コンピュ) that is not intermediate data .
JP2001331332A
CLAIM 1
【請求項1】 リソース割り当て単位であるタスク毎に リソースを利用して、処理単位であるスレッドを動作さ せ、該タスクに対するハードウェアリソースまたはソフ トウェアリソースの利用要求に対して動的にリソース量 を割り当て、かつ該タスクのリソース量の利用状況を取 得するリソース割り当て手段と、 リソース各々のリソース量を該タスクに対して予約・予 約変更・予約停止をそれぞれ行い、かつ前記リソース割 り当て手段と連携して予約が行われているタスクに対す る動的なリソース利用要求に対して予約されているリソ ース量を超えないようにリソース割り当てを行わせるリ ソース予約手段と、 各リソース予約手段に必要とされるリソース量を当該ア プリケーションシステムのサービスレベルとリソース量 との対応を蓄積したサービスレベル・リソース量対応デ ータベース手段とを備えたことを特徴とするコンピュ (processing data) ー タシステム。

US8190610B2
CLAIM 17
. A computer system (行うこと) including a plurality of computing devices , the computer system configured to process data of a data set , wherein the data set comprises a plurality of data groups , the computer system comprises at least one processor and memory that are operable to perform the following operations : partitioning the data of each one of the data groups into a plurality of data partitions that each have a plurality of key-value pairs and providing each data partition to a selected one of a plurality of mapping functions that are each user-configurable to independently output a plurality of lists of values for each of a set of keys found in such map function' ;

s corresponding data partition to form corresponding intermediate data for that data group and identifiable to that data group , wherein the data of a first data group has a different schema than the data of a second data group and the data of the first data group is mapped differently than the data of the second data group so that different lists of values are output for the corresponding different intermediate data , wherein the different schema and corresponding different intermediate data have a key in common ;

and reduce the intermediate data for the data groups to at least one output data group , including processing the intermediate data for each data group in a manner that is defined to correspond to that data group so as to result in a merging of the corresponding different intermediate data based on the key in common .
JP2001331332A
CLAIM 3
【請求項3】 請求項2に記載のアプリケーションシス テムのリソース予約方法において、 前記サービスレベル・リソース量対応データベース手段 は、一つ以上のタスクと一つ以上のスレッドから構成さ れたアプリケーションシステムに対して模擬的に負荷を かけ、 該負荷を変化させながら、実現できたサービスの質であ るサービスレベルと各リソース予約手段から取得される リソース量の利用状況との対応を前記サービスレベル・ リソース量対応データベース手段に蓄積して、初期化を 行うこと (computer system) を特徴とするアプリケーションシステムのリソ ース予約方法。

US8190610B2
CLAIM 18
. The computer system (行うこと) of claim 17 , wherein : the at least one output data group is a plurality of output data groups .
JP2001331332A
CLAIM 3
【請求項3】 請求項2に記載のアプリケーションシス テムのリソース予約方法において、 前記サービスレベル・リソース量対応データベース手段 は、一つ以上のタスクと一つ以上のスレッドから構成さ れたアプリケーションシステムに対して模擬的に負荷を かけ、 該負荷を変化させながら、実現できたサービスの質であ るサービスレベルと各リソース予約手段から取得される リソース量の利用状況との対応を前記サービスレベル・ リソース量対応データベース手段に蓄積して、初期化を 行うこと (computer system) を特徴とするアプリケーションシステムのリソ ース予約方法。

US8190610B2
CLAIM 19
. The computer system (行うこと) of claim 17 , wherein : corresponding intermediate data for a data group being identifiable to the data group includes the corresponding intermediate data being stored such that metadata associated with the manner of storage is identifiable to the data group .
JP2001331332A
CLAIM 3
【請求項3】 請求項2に記載のアプリケーションシス テムのリソース予約方法において、 前記サービスレベル・リソース量対応データベース手段 は、一つ以上のタスクと一つ以上のスレッドから構成さ れたアプリケーションシステムに対して模擬的に負荷を かけ、 該負荷を変化させながら、実現できたサービスの質であ るサービスレベルと各リソース予約手段から取得される リソース量の利用状況との対応を前記サービスレベル・ リソース量対応データベース手段に蓄積して、初期化を 行うこと (computer system) を特徴とするアプリケーションシステムのリソ ース予約方法。

US8190610B2
CLAIM 20
. The computer system (行うこと) of claim 19 , wherein : corresponding intermediate data for a data group being identifiable to the data group includes the computer system being configured to store the corresponding intermediate data in association with data that is identifiable to the data group .
JP2001331332A
CLAIM 3
【請求項3】 請求項2に記載のアプリケーションシス テムのリソース予約方法において、 前記サービスレベル・リソース量対応データベース手段 は、一つ以上のタスクと一つ以上のスレッドから構成さ れたアプリケーションシステムに対して模擬的に負荷を かけ、 該負荷を変化させながら、実現できたサービスの質であ るサービスレベルと各リソース予約手段から取得される リソース量の利用状況との対応を前記サービスレベル・ リソース量対応データベース手段に蓄積して、初期化を 行うこと (computer system) を特徴とするアプリケーションシステムのリソ ース予約方法。

US8190610B2
CLAIM 21
. The computer system (行うこと) of claim 17 , wherein : the at least one processor and memory being configured to process the intermediate data for each data group in a manner that is defined to correspond to that data group includes , for each data group , the at least one processor and memory being configured to employ an iterator that corresponds to that data group .
JP2001331332A
CLAIM 3
【請求項3】 請求項2に記載のアプリケーションシス テムのリソース予約方法において、 前記サービスレベル・リソース量対応データベース手段 は、一つ以上のタスクと一つ以上のスレッドから構成さ れたアプリケーションシステムに対して模擬的に負荷を かけ、 該負荷を変化させながら、実現できたサービスの質であ るサービスレベルと各リソース予約手段から取得される リソース量の利用状況との対応を前記サービスレベル・ リソース量対応データベース手段に蓄積して、初期化を 行うこと (computer system) を特徴とするアプリケーションシステムのリソ ース予約方法。

US8190610B2
CLAIM 22
. The computer system (行うこと) of claim 21 , wherein : the intermediate data includes a plurality of grouped sets of key/value pairs ;

the at least one processor and memory are further operable to reduce the intermediate data via a plurality of reducers ;

for at least one of the reducers , the iterator corresponding to a particular data group , for that reducer , is configured to operate according to a different key of a different schema than the iterator corresponding to another particular data group , for that reducer .
JP2001331332A
CLAIM 3
【請求項3】 請求項2に記載のアプリケーションシス テムのリソース予約方法において、 前記サービスレベル・リソース量対応データベース手段 は、一つ以上のタスクと一つ以上のスレッドから構成さ れたアプリケーションシステムに対して模擬的に負荷を かけ、 該負荷を変化させながら、実現できたサービスの質であ るサービスレベルと各リソース予約手段から取得される リソース量の利用状況との対応を前記サービスレベル・ リソース量対応データベース手段に蓄積して、初期化を 行うこと (computer system) を特徴とするアプリケーションシステムのリソ ース予約方法。

US8190610B2
CLAIM 23
. The computer system (行うこと) of claim 17 , wherein : the intermediate data includes a plurality of grouped sets of key/value pairs ;

the at least one processor and memory are further operable to reduce the intermediate data via a plurality of reducers ;

and the at least one processor and memory are further operable to : partition the intermediate data into a plurality of partitions , to provide each key/value pair of the intermediate data to a separate one of the partitions ;

and provide the intermediate data of each partition to a separate one of the reducers .
JP2001331332A
CLAIM 3
【請求項3】 請求項2に記載のアプリケーションシス テムのリソース予約方法において、 前記サービスレベル・リソース量対応データベース手段 は、一つ以上のタスクと一つ以上のスレッドから構成さ れたアプリケーションシステムに対して模擬的に負荷を かけ、 該負荷を変化させながら、実現できたサービスの質であ るサービスレベルと各リソース予約手段から取得される リソース量の利用状況との対応を前記サービスレベル・ リソース量対応データベース手段に蓄積して、初期化を 行うこと (computer system) を特徴とするアプリケーションシステムのリソ ース予約方法。

US8190610B2
CLAIM 24
. The computer system (行うこと) of claim 17 , wherein : the intermediate data includes a plurality of grouped sets of key/value pairs ;

the at least one processor and memory are further operable to reduce the intermediate data via a plurality of reducers ;

and the at least one processor and memory are further operable to : partition the intermediate data into a plurality of partitions , to provide at least some of the key/value pairs of the intermediate data to more than one of the partitions ;

and provide the intermediate data of each partition to a separate one of the reducers .
JP2001331332A
CLAIM 3
【請求項3】 請求項2に記載のアプリケーションシス テムのリソース予約方法において、 前記サービスレベル・リソース量対応データベース手段 は、一つ以上のタスクと一つ以上のスレッドから構成さ れたアプリケーションシステムに対して模擬的に負荷を かけ、 該負荷を変化させながら、実現できたサービスの質であ るサービスレベルと各リソース予約手段から取得される リソース量の利用状況との対応を前記サービスレベル・ リソース量対応データベース手段に蓄積して、初期化を 行うこと (computer system) を特徴とするアプリケーションシステムのリソ ース予約方法。

US8190610B2
CLAIM 25
. The computer system (行うこと) of claim 24 , wherein : the at least one processor and memory are further operable to , in partitioning , to provide all of the key/value pairs of the intermediate data to all of the partitions .
JP2001331332A
CLAIM 3
【請求項3】 請求項2に記載のアプリケーションシス テムのリソース予約方法において、 前記サービスレベル・リソース量対応データベース手段 は、一つ以上のタスクと一つ以上のスレッドから構成さ れたアプリケーションシステムに対して模擬的に負荷を かけ、 該負荷を変化させながら、実現できたサービスの質であ るサービスレベルと各リソース予約手段から取得される リソース量の利用状況との対応を前記サービスレベル・ リソース量対応データベース手段に蓄積して、初期化を 行うこと (computer system) を特徴とするアプリケーションシステムのリソ ース予約方法。

US8190610B2
CLAIM 26
. The computer system (行うこと) of claim 24 , wherein : at least some of the reducers include a sort , group-by-key and combine task (前記リソース) ;

the at least one processor and memory are further operable to generate and provide metadata for at least some of the mapping , partitioning , combining , grouping and sorting .
JP2001331332A
CLAIM 1
【請求項1】 リソース割り当て単位であるタスク毎に リソースを利用して、処理単位であるスレッドを動作さ せ、該タスクに対するハードウェアリソースまたはソフ トウェアリソースの利用要求に対して動的にリソース量 を割り当て、かつ該タスクのリソース量の利用状況を取 得するリソース割り当て手段と、 リソース各々のリソース量を該タスクに対して予約・予 約変更・予約停止をそれぞれ行い、かつ前記リソース (combine task) 割 り当て手段と連携して予約が行われているタスクに対す る動的なリソース利用要求に対して予約されているリソ ース量を超えないようにリソース割り当てを行わせるリ ソース予約手段と、 各リソース予約手段に必要とされるリソース量を当該ア プリケーションシステムのサービスレベルとリソース量 との対応を蓄積したサービスレベル・リソース量対応デ ータベース手段とを備えたことを特徴とするコンピュー タシステム。

JP2001331332A
CLAIM 3
【請求項3】 請求項2に記載のアプリケーションシス テムのリソース予約方法において、 前記サービスレベル・リソース量対応データベース手段 は、一つ以上のタスクと一つ以上のスレッドから構成さ れたアプリケーションシステムに対して模擬的に負荷を かけ、 該負荷を変化させながら、実現できたサービスの質であ るサービスレベルと各リソース予約手段から取得される リソース量の利用状況との対応を前記サービスレベル・ リソース量対応データベース手段に蓄積して、初期化を 行うこと (computer system) を特徴とするアプリケーションシステムのリソ ース予約方法。

US8190610B2
CLAIM 27
. The computer system (行うこと) of claim 26 , wherein : the reducing includes processing the metadata .
JP2001331332A
CLAIM 3
【請求項3】 請求項2に記載のアプリケーションシス テムのリソース予約方法において、 前記サービスレベル・リソース量対応データベース手段 は、一つ以上のタスクと一つ以上のスレッドから構成さ れたアプリケーションシステムに対して模擬的に負荷を かけ、 該負荷を変化させながら、実現できたサービスの質であ るサービスレベルと各リソース予約手段から取得される リソース量の利用状況との対応を前記サービスレベル・ リソース量対応データベース手段に蓄積して、初期化を 行うこと (computer system) を特徴とするアプリケーションシステムのリソ ース予約方法。

US8190610B2
CLAIM 28
. The computer system (行うこと) of claim 27 , wherein : the at least one processor and memory being further operable to process the intermediate data for each data group in a manner that is defined to correspond to that data group includes , for each data group , the at least one processor and memory being further operable to employ an iterator that corresponds to that data group , wherein the iterator is configured to provide the associated metadata to the reducing .
JP2001331332A
CLAIM 3
【請求項3】 請求項2に記載のアプリケーションシス テムのリソース予約方法において、 前記サービスレベル・リソース量対応データベース手段 は、一つ以上のタスクと一つ以上のスレッドから構成さ れたアプリケーションシステムに対して模擬的に負荷を かけ、 該負荷を変化させながら、実現できたサービスの質であ るサービスレベルと各リソース予約手段から取得される リソース量の利用状況との対応を前記サービスレベル・ リソース量対応データベース手段に蓄積して、初期化を 行うこと (computer system) を特徴とするアプリケーションシステムのリソ ース予約方法。

US8190610B2
CLAIM 29
. The computer system (行うこと) of claim 17 , wherein : the intermediate data processing of the reducing further comprises processing data (コンピュ) that is not intermediate data .
JP2001331332A
CLAIM 1
【請求項1】 リソース割り当て単位であるタスク毎に リソースを利用して、処理単位であるスレッドを動作さ せ、該タスクに対するハードウェアリソースまたはソフ トウェアリソースの利用要求に対して動的にリソース量 を割り当て、かつ該タスクのリソース量の利用状況を取 得するリソース割り当て手段と、 リソース各々のリソース量を該タスクに対して予約・予 約変更・予約停止をそれぞれ行い、かつ前記リソース割 り当て手段と連携して予約が行われているタスクに対す る動的なリソース利用要求に対して予約されているリソ ース量を超えないようにリソース割り当てを行わせるリ ソース予約手段と、 各リソース予約手段に必要とされるリソース量を当該ア プリケーションシステムのサービスレベルとリソース量 との対応を蓄積したサービスレベル・リソース量対応デ ータベース手段とを備えたことを特徴とするコンピュ (processing data) ー タシステム。

JP2001331332A
CLAIM 3
【請求項3】 請求項2に記載のアプリケーションシス テムのリソース予約方法において、 前記サービスレベル・リソース量対応データベース手段 は、一つ以上のタスクと一つ以上のスレッドから構成さ れたアプリケーションシステムに対して模擬的に負荷を かけ、 該負荷を変化させながら、実現できたサービスの質であ るサービスレベルと各リソース予約手段から取得される リソース量の利用状況との対応を前記サービスレベル・ リソース量対応データベース手段に蓄積して、初期化を 行うこと (computer system) を特徴とするアプリケーションシステムのリソ ース予約方法。

US8190610B2
CLAIM 30
. The computer system (行うこと) of claim 29 , wherein : the at least one processor and memory are further operable to reduce the intermediate data via a plurality of reducers ;

and the data that is not intermediate data , for a particular reducer , includes data that is associated with another reducer .
JP2001331332A
CLAIM 3
【請求項3】 請求項2に記載のアプリケーションシス テムのリソース予約方法において、 前記サービスレベル・リソース量対応データベース手段 は、一つ以上のタスクと一つ以上のスレッドから構成さ れたアプリケーションシステムに対して模擬的に負荷を かけ、 該負荷を変化させながら、実現できたサービスの質であ るサービスレベルと各リソース予約手段から取得される リソース量の利用状況との対応を前記サービスレベル・ リソース量対応データベース手段に蓄積して、初期化を 行うこと (computer system) を特徴とするアプリケーションシステムのリソ ース予約方法。

US8190610B2
CLAIM 31
. The computer system (行うこと) of claim 29 , wherein : the at least one processor and memory are further operable to reduce the intermediate data via a plurality of reducers ;

and the data that is not intermediate data , for a particular reducer , includes data that is associated with that reducer .
JP2001331332A
CLAIM 3
【請求項3】 請求項2に記載のアプリケーションシス テムのリソース予約方法において、 前記サービスレベル・リソース量対応データベース手段 は、一つ以上のタスクと一つ以上のスレッドから構成さ れたアプリケーションシステムに対して模擬的に負荷を かけ、 該負荷を変化させながら、実現できたサービスの質であ るサービスレベルと各リソース予約手段から取得される リソース量の利用状況との対応を前記サービスレベル・ リソース量対応データベース手段に蓄積して、初期化を 行うこと (computer system) を特徴とするアプリケーションシステムのリソ ース予約方法。

US8190610B2
CLAIM 32
. The computer system (行うこと) of claim 17 , wherein the at least one processor and memory are further operable to reduce by relating the data among the plurality of data groups .
JP2001331332A
CLAIM 3
【請求項3】 請求項2に記載のアプリケーションシス テムのリソース予約方法において、 前記サービスレベル・リソース量対応データベース手段 は、一つ以上のタスクと一つ以上のスレッドから構成さ れたアプリケーションシステムに対して模擬的に負荷を かけ、 該負荷を変化させながら、実現できたサービスの質であ るサービスレベルと各リソース予約手段から取得される リソース量の利用状況との対応を前記サービスレベル・ リソース量対応データベース手段に蓄積して、初期化を 行うこと (computer system) を特徴とするアプリケーションシステムのリソ ース予約方法。

US8190610B2
CLAIM 33
. A map-reduce method of processing data (コンピュ) from a plurality of groups having different schema over a computer system (行うこと) , the method comprising : for a first data set having a plurality of first key-value pairs , wherein such first data set belongs to a first data group and the first key-value pairs have a first schema , partitioning the first data set into a plurality of data partitions and providing each data partition to a selected one of a plurality of mapping functions that are each user-configurable to independently output a plurality of lists of values for each of a set of keys found in such map function' ;

s corresponding data partition to form a first intermediate data set having a first set of resulting key-value pairs ;

for a second data set having a plurality of second key-value pairs , wherein such second data set belongs to a second data group and the second key-value pairs have a second schema , partitioning the first data set into a plurality of data partitions and providing each data partition to a selected one of a plurality of mapping functions to form a second intermediate data set having a second set of resulting key-value pairs differing from the first set of resulting key-value pairs , wherein the first schema differs from the second schema , wherein the first and second schema and the first and second set of resulting key-value pairs have a key in common ;

and reducing the first and the second intermediate data set together so as to form an output data set , wherein the reducing is accomplished by iterating on at least one key from the first and second intermediate data set , and the output data set has a different schema than the first and second schema , wherein the mapping and reducing operations are performed by a distributed system .
JP2001331332A
CLAIM 1
【請求項1】 リソース割り当て単位であるタスク毎に リソースを利用して、処理単位であるスレッドを動作さ せ、該タスクに対するハードウェアリソースまたはソフ トウェアリソースの利用要求に対して動的にリソース量 を割り当て、かつ該タスクのリソース量の利用状況を取 得するリソース割り当て手段と、 リソース各々のリソース量を該タスクに対して予約・予 約変更・予約停止をそれぞれ行い、かつ前記リソース割 り当て手段と連携して予約が行われているタスクに対す る動的なリソース利用要求に対して予約されているリソ ース量を超えないようにリソース割り当てを行わせるリ ソース予約手段と、 各リソース予約手段に必要とされるリソース量を当該ア プリケーションシステムのサービスレベルとリソース量 との対応を蓄積したサービスレベル・リソース量対応デ ータベース手段とを備えたことを特徴とするコンピュ (processing data) ー タシステム。

JP2001331332A
CLAIM 3
【請求項3】 請求項2に記載のアプリケーションシス テムのリソース予約方法において、 前記サービスレベル・リソース量対応データベース手段 は、一つ以上のタスクと一つ以上のスレッドから構成さ れたアプリケーションシステムに対して模擬的に負荷を かけ、 該負荷を変化させながら、実現できたサービスの質であ るサービスレベルと各リソース予約手段から取得される リソース量の利用状況との対応を前記サービスレベル・ リソース量対応データベース手段に蓄積して、初期化を 行うこと (computer system) を特徴とするアプリケーションシステムのリソ ース予約方法。

US8190610B2
CLAIM 37
. The map-reduce method of claim 35 , wherein at least some of the reducers include a sort , group-by-key and combine task (前記リソース) , the method further comprises generating and providing metadata for at least some of the mapping , partitioning , combining , grouping and sorting .
JP2001331332A
CLAIM 1
【請求項1】 リソース割り当て単位であるタスク毎に リソースを利用して、処理単位であるスレッドを動作さ せ、該タスクに対するハードウェアリソースまたはソフ トウェアリソースの利用要求に対して動的にリソース量 を割り当て、かつ該タスクのリソース量の利用状況を取 得するリソース割り当て手段と、 リソース各々のリソース量を該タスクに対して予約・予 約変更・予約停止をそれぞれ行い、かつ前記リソース (combine task) 割 り当て手段と連携して予約が行われているタスクに対す る動的なリソース利用要求に対して予約されているリソ ース量を超えないようにリソース割り当てを行わせるリ ソース予約手段と、 各リソース予約手段に必要とされるリソース量を当該ア プリケーションシステムのサービスレベルとリソース量 との対応を蓄積したサービスレベル・リソース量対応デ ータベース手段とを備えたことを特徴とするコンピュー タシステム。

US8190610B2
CLAIM 40
. A computer system (行うこと) including a plurality of computing devices , the computer system configured to process data of a data set , wherein the data set comprises a plurality of data groups , the computer system comprises at least one processor and memory that are operable to perform the following operations : for a first data set having a plurality of first key-value pairs , wherein such first data set belongs to a first data group and the first key-value pairs have a first schema , partitioning the first data set into a plurality of data partitions and providing each data partition to a selected one of a plurality of mapping functions that are each user-configurable to independently output a plurality of lists of values for each of a set of keys found in such map function' ;

s corresponding data partition to form a first intermediate data set having a first set of resulting key-value pairs ;

for a second data set having a plurality of second key-value pairs , wherein such second data set belongs to a second data group and the second key-value pairs have a second schema , partitioning the first data set into a plurality of data partitions and providing each data partition to a selected one of a plurality of mapping functions to form a second intermediate data set having a second set of resulting key-value pairs differing from the first set of resulting key-value pairs , wherein the first schema differs from the second schema ;

and reducing the first and the second intermediate data set together so as to form an output data set , wherein the reducing is accomplished by iterating on at least one key from the first and second intermediate data set , and the output data set has a different schema than the first and second schema , wherein the first and second schema and the first and second set of resulting key-value pairs have a key in common .
JP2001331332A
CLAIM 3
【請求項3】 請求項2に記載のアプリケーションシス テムのリソース予約方法において、 前記サービスレベル・リソース量対応データベース手段 は、一つ以上のタスクと一つ以上のスレッドから構成さ れたアプリケーションシステムに対して模擬的に負荷を かけ、 該負荷を変化させながら、実現できたサービスの質であ るサービスレベルと各リソース予約手段から取得される リソース量の利用状況との対応を前記サービスレベル・ リソース量対応データベース手段に蓄積して、初期化を 行うこと (computer system) を特徴とするアプリケーションシステムのリソ ース予約方法。

US8190610B2
CLAIM 41
. The computer system (行うこと) of claim 40 , wherein the reducing is accomplished by iterating on at least one key from each of the first and second intermediate data set so that the output data set is a merging of a portion of the first and second intermediate data set .
JP2001331332A
CLAIM 3
【請求項3】 請求項2に記載のアプリケーションシス テムのリソース予約方法において、 前記サービスレベル・リソース量対応データベース手段 は、一つ以上のタスクと一つ以上のスレッドから構成さ れたアプリケーションシステムに対して模擬的に負荷を かけ、 該負荷を変化させながら、実現できたサービスの質であ るサービスレベルと各リソース予約手段から取得される リソース量の利用状況との対応を前記サービスレベル・ リソース量対応データベース手段に蓄積して、初期化を 行うこと (computer system) を特徴とするアプリケーションシステムのリソ ース予約方法。

US8190610B2
CLAIM 42
. The computer system (行うこと) of claim 40 , wherein the reducing is accomplished by a plurality of reducers , the d the at least one processor and memory further operable for : partitioning the first or second intermediate data set into a plurality of partitions , at least some of the key-value pairs of the first or second intermediate data being provided to more than one of the partitions ;

and providing the partitioned intermediate data set of each partition to a separate one of the reducers .
JP2001331332A
CLAIM 3
【請求項3】 請求項2に記載のアプリケーションシス テムのリソース予約方法において、 前記サービスレベル・リソース量対応データベース手段 は、一つ以上のタスクと一つ以上のスレッドから構成さ れたアプリケーションシステムに対して模擬的に負荷を かけ、 該負荷を変化させながら、実現できたサービスの質であ るサービスレベルと各リソース予約手段から取得される リソース量の利用状況との対応を前記サービスレベル・ リソース量対応データベース手段に蓄積して、初期化を 行うこと (computer system) を特徴とするアプリケーションシステムのリソ ース予約方法。

US8190610B2
CLAIM 43
. The computer system (行うこと) of claim 42 , wherein all of the key-value pairs of the first and second intermediate data set are provided to all of the reducers .
JP2001331332A
CLAIM 3
【請求項3】 請求項2に記載のアプリケーションシス テムのリソース予約方法において、 前記サービスレベル・リソース量対応データベース手段 は、一つ以上のタスクと一つ以上のスレッドから構成さ れたアプリケーションシステムに対して模擬的に負荷を かけ、 該負荷を変化させながら、実現できたサービスの質であ るサービスレベルと各リソース予約手段から取得される リソース量の利用状況との対応を前記サービスレベル・ リソース量対応データベース手段に蓄積して、初期化を 行うこと (computer system) を特徴とするアプリケーションシステムのリソ ース予約方法。

US8190610B2
CLAIM 44
. The computer system (行うこと) of claim 42 , wherein at least some of the reducers include a sort , group-by-key and combine task (前記リソース) , the d the at least one processor and memory are further operable for : generating and providing metadata for at least some of the mapping , partitioning , combining , grouping and sorting .
JP2001331332A
CLAIM 1
【請求項1】 リソース割り当て単位であるタスク毎に リソースを利用して、処理単位であるスレッドを動作さ せ、該タスクに対するハードウェアリソースまたはソフ トウェアリソースの利用要求に対して動的にリソース量 を割り当て、かつ該タスクのリソース量の利用状況を取 得するリソース割り当て手段と、 リソース各々のリソース量を該タスクに対して予約・予 約変更・予約停止をそれぞれ行い、かつ前記リソース (combine task) 割 り当て手段と連携して予約が行われているタスクに対す る動的なリソース利用要求に対して予約されているリソ ース量を超えないようにリソース割り当てを行わせるリ ソース予約手段と、 各リソース予約手段に必要とされるリソース量を当該ア プリケーションシステムのサービスレベルとリソース量 との対応を蓄積したサービスレベル・リソース量対応デ ータベース手段とを備えたことを特徴とするコンピュー タシステム。

JP2001331332A
CLAIM 3
【請求項3】 請求項2に記載のアプリケーションシス テムのリソース予約方法において、 前記サービスレベル・リソース量対応データベース手段 は、一つ以上のタスクと一つ以上のスレッドから構成さ れたアプリケーションシステムに対して模擬的に負荷を かけ、 該負荷を変化させながら、実現できたサービスの質であ るサービスレベルと各リソース予約手段から取得される リソース量の利用状況との対応を前記サービスレベル・ リソース量対応データベース手段に蓄積して、初期化を 行うこと (computer system) を特徴とするアプリケーションシステムのリソ ース予約方法。

US8190610B2
CLAIM 45
. The computer system (行うこと) of claim 44 , wherein the reducing includes processing the metadata .
JP2001331332A
CLAIM 3
【請求項3】 請求項2に記載のアプリケーションシス テムのリソース予約方法において、 前記サービスレベル・リソース量対応データベース手段 は、一つ以上のタスクと一つ以上のスレッドから構成さ れたアプリケーションシステムに対して模擬的に負荷を かけ、 該負荷を変化させながら、実現できたサービスの質であ るサービスレベルと各リソース予約手段から取得される リソース量の利用状況との対応を前記サービスレベル・ リソース量対応データベース手段に蓄積して、初期化を 行うこと (computer system) を特徴とするアプリケーションシステムのリソ ース予約方法。

US8190610B2
CLAIM 46
. The computer system (行うこと) of claim 45 , wherein iterating includes providing the associated metadata to the processing of the reducing step .
JP2001331332A
CLAIM 3
【請求項3】 請求項2に記載のアプリケーションシス テムのリソース予約方法において、 前記サービスレベル・リソース量対応データベース手段 は、一つ以上のタスクと一つ以上のスレッドから構成さ れたアプリケーションシステムに対して模擬的に負荷を かけ、 該負荷を変化させながら、実現できたサービスの質であ るサービスレベルと各リソース予約手段から取得される リソース量の利用状況との対応を前記サービスレベル・ リソース量対応データベース手段に蓄積して、初期化を 行うこと (computer system) を特徴とするアプリケーションシステムのリソ ース予約方法。




US8190610B2

Filed: 2006-10-05     Issued: 2012-05-29

MapReduce for distributed database processing

(Original Assignee) Yahoo Inc     (Current Assignee) R2 Solutions LLC ; Altaba Inc

Ali Dasdan, Hung-Chih Yang, Ruey-Lung Hsiao
JP2001298453A

Filed: 2000-04-14     Issued: 2001-10-26

ネットワーク表示装置

(Original Assignee) Fuji Xerox Co Ltd; 富士ゼロックス株式会社     

Hantai Takahashi, 範泰 高橋
US8190610B2
CLAIM 1
. A method of processing data (コンピュ) of a data set over a distributed system , wherein the data set comprises a plurality of data groups , the method comprising : partitioning the data of each one of the data groups into a plurality of data partitions that each have a plurality of key-value pairs and providing each data partition to a selected one of a plurality of mapping functions that are each user-configurable to independently output a plurality of lists of values for each of a set of keys found in such map function' ;

s corresponding data partition to form corresponding intermediate data for that data group and identifiable to that data group , wherein the data of a first data group has a different schema (メモリ) than the data of a second data group and the data of the first data group is mapped differently than the data of the second data group so that different lists of values are output for the corresponding different intermediate data , wherein the different schema and corresponding different intermediate data have a key in common ;

and reducing the intermediate data for the data groups to at least one output data group , including processing the intermediate data for each data group in a manner that is defined to correspond to that data group , so as to result in a merging of the corresponding different intermediate data based on the key in common , wherein the mapping and reducing operations are performed by a distributed system .
JP2001298453A
CLAIM 6
【請求項6】 コンピュ (processing data) ータに実行させるプログラムを 当該コンピュータの入力手段が読取可能に記憶した記憶 媒体において、 当該プログラムは、ノードの集合及びノード間の関係を 示す属性が付与されて当該ノード間を関係付けるリンク の集合からなるネットワークの構造を規定するネットワ ークデータをネットワークデータメモリ (different schema) から読み出す処 理と、 読み出したネットワークデータにより規定されるネット ワークの構造に基づいて当該ネットワークに含まれるノ ードの配置を決定する処理と、 複数の属性の指定を含む指定情報を受け付ける処理と、 受け付けた指定情報に基づく適合度の高さに応じた大き さの重みを各リンクの重みとして決定する処理と、 決定された重みに基づいて、付与された属性の中に指定 された属性を含むリンクにより関係付けられたノード間 の距離が他のリンクにより関係付けられたノード間の距 離と比べて小さくなるように、決定されたノードの配置 を調整する処理と、 調整されたノード配置でネットワークを表示する処理と を当該コンピュータに実行させることを特徴とする記憶 媒体。

US8190610B2
CLAIM 9
. The method of claim 7 , wherein : at least some of the reducers include a sort , group-by-key and combine task ;

the method further comprises generating and providing metadata (関係付け) for at least some of the mapping , partitioning , combining , grouping and sorting .
JP2001298453A
CLAIM 1
【請求項1】 ノードの集合及びノード間の関係を示す 属性が付与されて当該ノード間を関係付け (providing metadata) るリンクの集 合からなるネットワークを表示するネットワーク表示装 置において、 ネットワークの構造を規定するネットワークデータを記 憶するネットワークデータ記憶手段と、 ネットワークデータにより規定されるネットワークの構 造に基づいて当該ネットワークに含まれるノードの配置 を決定するノード配置決定手段と、 複数の属性の指定を含む指定情報を受け付ける受付手段 と、 受け付けた指定情報に基づく適合度の高さに応じた大き さの重みを各リンクの重みとして決定するリンク重み決 定手段と、 決定された重みに基づいて、付与された属性の中に指定 された属性を含むリンクにより関係付けられたノード間 の距離が他のリンクにより関係付けられたノード間の距 離と比べて小さくなるように、ノード配置決定手段によ り決定されたノードの配置を調整するノード配置調整手 段と、 調整されたノード配置でネットワークを表示するネット ワーク表示手段と、 を備えたことを特徴とするネットワーク表示装置。

US8190610B2
CLAIM 12
. The method of claim 5 , wherein : the intermediate data includes a plurality of grouped sets of key/value pairs ;

the reducing step is carried out by a plurality of reducers ;

for at least one of the reducers , the iterator corresponding to a particular data group , for that reducer , operates according to a different key of a different schema (メモリ) than the iterator corresponding to another particular data group , for that reducer .
JP2001298453A
CLAIM 6
【請求項6】 コンピュータに実行させるプログラムを 当該コンピュータの入力手段が読取可能に記憶した記憶 媒体において、 当該プログラムは、ノードの集合及びノード間の関係を 示す属性が付与されて当該ノード間を関係付けるリンク の集合からなるネットワークの構造を規定するネットワ ークデータをネットワークデータメモリ (different schema) から読み出す処 理と、 読み出したネットワークデータにより規定されるネット ワークの構造に基づいて当該ネットワークに含まれるノ ードの配置を決定する処理と、 複数の属性の指定を含む指定情報を受け付ける処理と、 受け付けた指定情報に基づく適合度の高さに応じた大き さの重みを各リンクの重みとして決定する処理と、 決定された重みに基づいて、付与された属性の中に指定 された属性を含むリンクにより関係付けられたノード間 の距離が他のリンクにより関係付けられたノード間の距 離と比べて小さくなるように、決定されたノードの配置 を調整する処理と、 調整されたノード配置でネットワークを表示する処理と を当該コンピュータに実行させることを特徴とする記憶 媒体。

US8190610B2
CLAIM 13
. The method of claim 1 , wherein : the intermediate data processing step of the reducing step further comprises processing data (コンピュ) that is not intermediate data .
JP2001298453A
CLAIM 6
【請求項6】 コンピュ (processing data) ータに実行させるプログラムを 当該コンピュータの入力手段が読取可能に記憶した記憶 媒体において、 当該プログラムは、ノードの集合及びノード間の関係を 示す属性が付与されて当該ノード間を関係付けるリンク の集合からなるネットワークの構造を規定するネットワ ークデータをネットワークデータメモリから読み出す処 理と、 読み出したネットワークデータにより規定されるネット ワークの構造に基づいて当該ネットワークに含まれるノ ードの配置を決定する処理と、 複数の属性の指定を含む指定情報を受け付ける処理と、 受け付けた指定情報に基づく適合度の高さに応じた大き さの重みを各リンクの重みとして決定する処理と、 決定された重みに基づいて、付与された属性の中に指定 された属性を含むリンクにより関係付けられたノード間 の距離が他のリンクにより関係付けられたノード間の距 離と比べて小さくなるように、決定されたノードの配置 を調整する処理と、 調整されたノード配置でネットワークを表示する処理と を当該コンピュータに実行させることを特徴とする記憶 媒体。

US8190610B2
CLAIM 17
. A computer system including a plurality of computing devices , the computer system configured to process data of a data set , wherein the data set comprises a plurality of data groups , the computer system comprises at least one processor and memory that are operable to perform the following operations : partitioning the data of each one of the data groups into a plurality of data partitions that each have a plurality of key-value pairs and providing each data partition to a selected one of a plurality of mapping functions that are each user-configurable to independently output a plurality of lists of values for each of a set of keys found in such map function' ;

s corresponding data partition to form corresponding intermediate data for that data group and identifiable to that data group , wherein the data of a first data group has a different schema (メモリ) than the data of a second data group and the data of the first data group is mapped differently than the data of the second data group so that different lists of values are output for the corresponding different intermediate data , wherein the different schema and corresponding different intermediate data have a key in common ;

and reduce the intermediate data for the data groups to at least one output data group , including processing the intermediate data for each data group in a manner that is defined to correspond to that data group so as to result in a merging of the corresponding different intermediate data based on the key in common .
JP2001298453A
CLAIM 6
【請求項6】 コンピュータに実行させるプログラムを 当該コンピュータの入力手段が読取可能に記憶した記憶 媒体において、 当該プログラムは、ノードの集合及びノード間の関係を 示す属性が付与されて当該ノード間を関係付けるリンク の集合からなるネットワークの構造を規定するネットワ ークデータをネットワークデータメモリ (different schema) から読み出す処 理と、 読み出したネットワークデータにより規定されるネット ワークの構造に基づいて当該ネットワークに含まれるノ ードの配置を決定する処理と、 複数の属性の指定を含む指定情報を受け付ける処理と、 受け付けた指定情報に基づく適合度の高さに応じた大き さの重みを各リンクの重みとして決定する処理と、 決定された重みに基づいて、付与された属性の中に指定 された属性を含むリンクにより関係付けられたノード間 の距離が他のリンクにより関係付けられたノード間の距 離と比べて小さくなるように、決定されたノードの配置 を調整する処理と、 調整されたノード配置でネットワークを表示する処理と を当該コンピュータに実行させることを特徴とする記憶 媒体。

US8190610B2
CLAIM 22
. The computer system of claim 21 , wherein : the intermediate data includes a plurality of grouped sets of key/value pairs ;

the at least one processor and memory are further operable to reduce the intermediate data via a plurality of reducers ;

for at least one of the reducers , the iterator corresponding to a particular data group , for that reducer , is configured to operate according to a different key of a different schema (メモリ) than the iterator corresponding to another particular data group , for that reducer .
JP2001298453A
CLAIM 6
【請求項6】 コンピュータに実行させるプログラムを 当該コンピュータの入力手段が読取可能に記憶した記憶 媒体において、 当該プログラムは、ノードの集合及びノード間の関係を 示す属性が付与されて当該ノード間を関係付けるリンク の集合からなるネットワークの構造を規定するネットワ ークデータをネットワークデータメモリ (different schema) から読み出す処 理と、 読み出したネットワークデータにより規定されるネット ワークの構造に基づいて当該ネットワークに含まれるノ ードの配置を決定する処理と、 複数の属性の指定を含む指定情報を受け付ける処理と、 受け付けた指定情報に基づく適合度の高さに応じた大き さの重みを各リンクの重みとして決定する処理と、 決定された重みに基づいて、付与された属性の中に指定 された属性を含むリンクにより関係付けられたノード間 の距離が他のリンクにより関係付けられたノード間の距 離と比べて小さくなるように、決定されたノードの配置 を調整する処理と、 調整されたノード配置でネットワークを表示する処理と を当該コンピュータに実行させることを特徴とする記憶 媒体。

US8190610B2
CLAIM 29
. The computer system of claim 17 , wherein : the intermediate data processing of the reducing further comprises processing data (コンピュ) that is not intermediate data .
JP2001298453A
CLAIM 6
【請求項6】 コンピュ (processing data) ータに実行させるプログラムを 当該コンピュータの入力手段が読取可能に記憶した記憶 媒体において、 当該プログラムは、ノードの集合及びノード間の関係を 示す属性が付与されて当該ノード間を関係付けるリンク の集合からなるネットワークの構造を規定するネットワ ークデータをネットワークデータメモリから読み出す処 理と、 読み出したネットワークデータにより規定されるネット ワークの構造に基づいて当該ネットワークに含まれるノ ードの配置を決定する処理と、 複数の属性の指定を含む指定情報を受け付ける処理と、 受け付けた指定情報に基づく適合度の高さに応じた大き さの重みを各リンクの重みとして決定する処理と、 決定された重みに基づいて、付与された属性の中に指定 された属性を含むリンクにより関係付けられたノード間 の距離が他のリンクにより関係付けられたノード間の距 離と比べて小さくなるように、決定されたノードの配置 を調整する処理と、 調整されたノード配置でネットワークを表示する処理と を当該コンピュータに実行させることを特徴とする記憶 媒体。

US8190610B2
CLAIM 33
. A map-reduce method of processing data (コンピュ) from a plurality of groups having different schema (メモリ) over a computer system , the method comprising : for a first data set having a plurality of first key-value pairs , wherein such first data set belongs to a first data group and the first key-value pairs have a first schema , partitioning the first data set into a plurality of data partitions and providing each data partition to a selected one of a plurality of mapping functions that are each user-configurable to independently output a plurality of lists of values for each of a set of keys found in such map function' ;

s corresponding data partition to form a first intermediate data set having a first set of resulting key-value pairs ;

for a second data set having a plurality of second key-value pairs , wherein such second data set belongs to a second data group and the second key-value pairs have a second schema , partitioning the first data set into a plurality of data partitions and providing each data partition to a selected one of a plurality of mapping functions to form a second intermediate data set having a second set of resulting key-value pairs differing from the first set of resulting key-value pairs , wherein the first schema differs from the second schema , wherein the first and second schema and the first and second set of resulting key-value pairs have a key in common ;

and reducing the first and the second intermediate data set together so as to form an output data set , wherein the reducing is accomplished by iterating on at least one key from the first and second intermediate data set , and the output data set has a different schema than the first and second schema , wherein the mapping and reducing operations are performed by a distributed system .
JP2001298453A
CLAIM 6
【請求項6】 コンピュ (processing data) ータに実行させるプログラムを 当該コンピュータの入力手段が読取可能に記憶した記憶 媒体において、 当該プログラムは、ノードの集合及びノード間の関係を 示す属性が付与されて当該ノード間を関係付けるリンク の集合からなるネットワークの構造を規定するネットワ ークデータをネットワークデータメモリ (different schema) から読み出す処 理と、 読み出したネットワークデータにより規定されるネット ワークの構造に基づいて当該ネットワークに含まれるノ ードの配置を決定する処理と、 複数の属性の指定を含む指定情報を受け付ける処理と、 受け付けた指定情報に基づく適合度の高さに応じた大き さの重みを各リンクの重みとして決定する処理と、 決定された重みに基づいて、付与された属性の中に指定 された属性を含むリンクにより関係付けられたノード間 の距離が他のリンクにより関係付けられたノード間の距 離と比べて小さくなるように、決定されたノードの配置 を調整する処理と、 調整されたノード配置でネットワークを表示する処理と を当該コンピュータに実行させることを特徴とする記憶 媒体。

US8190610B2
CLAIM 37
. The map-reduce method of claim 35 , wherein at least some of the reducers include a sort , group-by-key and combine task , the method further comprises generating and providing metadata (関係付け) for at least some of the mapping , partitioning , combining , grouping and sorting .
JP2001298453A
CLAIM 1
【請求項1】 ノードの集合及びノード間の関係を示す 属性が付与されて当該ノード間を関係付け (providing metadata) るリンクの集 合からなるネットワークを表示するネットワーク表示装 置において、 ネットワークの構造を規定するネットワークデータを記 憶するネットワークデータ記憶手段と、 ネットワークデータにより規定されるネットワークの構 造に基づいて当該ネットワークに含まれるノードの配置 を決定するノード配置決定手段と、 複数の属性の指定を含む指定情報を受け付ける受付手段 と、 受け付けた指定情報に基づく適合度の高さに応じた大き さの重みを各リンクの重みとして決定するリンク重み決 定手段と、 決定された重みに基づいて、付与された属性の中に指定 された属性を含むリンクにより関係付けられたノード間 の距離が他のリンクにより関係付けられたノード間の距 離と比べて小さくなるように、ノード配置決定手段によ り決定されたノードの配置を調整するノード配置調整手 段と、 調整されたノード配置でネットワークを表示するネット ワーク表示手段と、 を備えたことを特徴とするネットワーク表示装置。

US8190610B2
CLAIM 40
. A computer system including a plurality of computing devices , the computer system configured to process data of a data set , wherein the data set comprises a plurality of data groups , the computer system comprises at least one processor and memory that are operable to perform the following operations : for a first data set having a plurality of first key-value pairs , wherein such first data set belongs to a first data group and the first key-value pairs have a first schema , partitioning the first data set into a plurality of data partitions and providing each data partition to a selected one of a plurality of mapping functions that are each user-configurable to independently output a plurality of lists of values for each of a set of keys found in such map function' ;

s corresponding data partition to form a first intermediate data set having a first set of resulting key-value pairs ;

for a second data set having a plurality of second key-value pairs , wherein such second data set belongs to a second data group and the second key-value pairs have a second schema , partitioning the first data set into a plurality of data partitions and providing each data partition to a selected one of a plurality of mapping functions to form a second intermediate data set having a second set of resulting key-value pairs differing from the first set of resulting key-value pairs , wherein the first schema differs from the second schema ;

and reducing the first and the second intermediate data set together so as to form an output data set , wherein the reducing is accomplished by iterating on at least one key from the first and second intermediate data set , and the output data set has a different schema (メモリ) than the first and second schema , wherein the first and second schema and the first and second set of resulting key-value pairs have a key in common .
JP2001298453A
CLAIM 6
【請求項6】 コンピュータに実行させるプログラムを 当該コンピュータの入力手段が読取可能に記憶した記憶 媒体において、 当該プログラムは、ノードの集合及びノード間の関係を 示す属性が付与されて当該ノード間を関係付けるリンク の集合からなるネットワークの構造を規定するネットワ ークデータをネットワークデータメモリ (different schema) から読み出す処 理と、 読み出したネットワークデータにより規定されるネット ワークの構造に基づいて当該ネットワークに含まれるノ ードの配置を決定する処理と、 複数の属性の指定を含む指定情報を受け付ける処理と、 受け付けた指定情報に基づく適合度の高さに応じた大き さの重みを各リンクの重みとして決定する処理と、 決定された重みに基づいて、付与された属性の中に指定 された属性を含むリンクにより関係付けられたノード間 の距離が他のリンクにより関係付けられたノード間の距 離と比べて小さくなるように、決定されたノードの配置 を調整する処理と、 調整されたノード配置でネットワークを表示する処理と を当該コンピュータに実行させることを特徴とする記憶 媒体。

US8190610B2
CLAIM 44
. The computer system of claim 42 , wherein at least some of the reducers include a sort , group-by-key and combine task , the d the at least one processor and memory are further operable for : generating and providing metadata (関係付け) for at least some of the mapping , partitioning , combining , grouping and sorting .
JP2001298453A
CLAIM 1
【請求項1】 ノードの集合及びノード間の関係を示す 属性が付与されて当該ノード間を関係付け (providing metadata) るリンクの集 合からなるネットワークを表示するネットワーク表示装 置において、 ネットワークの構造を規定するネットワークデータを記 憶するネットワークデータ記憶手段と、 ネットワークデータにより規定されるネットワークの構 造に基づいて当該ネットワークに含まれるノードの配置 を決定するノード配置決定手段と、 複数の属性の指定を含む指定情報を受け付ける受付手段 と、 受け付けた指定情報に基づく適合度の高さに応じた大き さの重みを各リンクの重みとして決定するリンク重み決 定手段と、 決定された重みに基づいて、付与された属性の中に指定 された属性を含むリンクにより関係付けられたノード間 の距離が他のリンクにより関係付けられたノード間の距 離と比べて小さくなるように、ノード配置決定手段によ り決定されたノードの配置を調整するノード配置調整手 段と、 調整されたノード配置でネットワークを表示するネット ワーク表示手段と、 を備えたことを特徴とするネットワーク表示装置。




US8190610B2

Filed: 2006-10-05     Issued: 2012-05-29

MapReduce for distributed database processing

(Original Assignee) Yahoo Inc     (Current Assignee) R2 Solutions LLC ; Altaba Inc

Ali Dasdan, Hung-Chih Yang, Ruey-Lung Hsiao
US7103836B1

Filed: 2000-01-10     Issued: 2006-09-05

Method and system for generating materials for presentation on a non-frame capable web browser

(Original Assignee) International Business Machines Corp     (Current Assignee) International Business Machines Corp

Lee Evan Nakamura, Stewart Eugene Tate
US8190610B2
CLAIM 17
. A computer system including a plurality of computing devices (first portion) , the computer system configured to process data of a data set , wherein the data set comprises a plurality of data groups , the computer system comprises at least one processor and memory that are operable to perform the following operations : partitioning the data of each one of the data groups into a plurality of data partitions that each have a plurality of key-value pairs and providing each data partition to a selected one of a plurality of mapping functions that are each user-configurable to independently output a plurality of lists of values for each of a set of keys found in such map function' ;

s corresponding data partition to form corresponding intermediate data for that data group and identifiable to that data group , wherein the data of a first data group has a different schema than the data of a second data group and the data of the first data group is mapped differently than the data of the second data group so that different lists of values are output for the corresponding different intermediate data , wherein the different schema and corresponding different intermediate data have a key in common ;

and reduce the intermediate data for the data groups to at least one output data group , including processing the intermediate data for each data group in a manner that is defined to correspond to that data group so as to result in a merging of the corresponding different intermediate data based on the key in common .
US7103836B1
CLAIM 1
. A method for managing internet presentation materials in a single file format for ease of administration while presenting to a requestor only those portions of a file requested , comprising : defining , in a first portion (computing devices) of the file , a first variable for first information and a second variable for second information , where said first information and said second information are in the file ;
defining , in a second portion of the file , first and second presentation layouts , wherein said first presentation layout includes said first variable and said second presentation layout includes said second variable ;
and dynamically generating a page of presentation material in response to a request for said first information by using the first variable to extract said first information from the file and placing the first information in the page of presentation material , wherein the page is generated based on the first presentation layout and includes said first information and does not contain said second information .

US8190610B2
CLAIM 33
. A map-reduce method of processing data from a plurality of groups having different schema over a computer system , the method comprising : for a first data set having a plurality of first key-value pairs , wherein such first data set belongs to a first data group and the first key-value pairs have a first schema , partitioning the first data set into a plurality of data partitions and providing each data partition to a selected one of a plurality of mapping functions that are each user-configurable to independently output a plurality of lists of values for each of a set of keys found in such map function' ;

s corresponding data partition to form a first intermediate data set having a first set of resulting key-value pairs ;

for a second data set having a plurality of second key-value pairs , wherein such second data set belongs to a second data group and the second key-value pairs have a second schema , partitioning the first data set into a plurality of data partitions and providing each data partition to a selected one of a plurality of mapping functions to form a second intermediate data set having a second set (said server) of resulting key-value pairs differing from the first set of resulting key-value pairs , wherein the first schema differs from the second schema , wherein the first and second schema and the first and second set of resulting key-value pairs have a key in common ;

and reducing the first and the second intermediate data set together so as to form an output data set , wherein the reducing is accomplished by iterating on at least one key from the first and second intermediate data set , and the output data set has a different schema than the first and second schema , wherein the mapping and reducing operations are performed by a distributed system .
US7103836B1
CLAIM 8
. A system for managing internet presentation materials in a single file format for ease of administration while presenting to a requestor only those portions of a file requested , comprising : a server for communicating with a client ;
a dynamic presentation page generating unit , coupled with said server (second set) ;
a macro program , coupled with said dynamic presentation page generation unit , having first and second presentation material stored therein , wherein said dynamic presentation page generating unit , in response to a message received by said server requesting said first presentation material , dynamically generates a page of presentation material based on the macro program by using the first variable to extract said first information from the file and placing the first information in the page of presentation material , the generated page including said first presentation material and not including said second presentation material .

US8190610B2
CLAIM 40
. A computer system including a plurality of computing devices (first portion) , the computer system configured to process data of a data set , wherein the data set comprises a plurality of data groups , the computer system comprises at least one processor and memory that are operable to perform the following operations : for a first data set having a plurality of first key-value pairs , wherein such first data set belongs to a first data group and the first key-value pairs have a first schema , partitioning the first data set into a plurality of data partitions and providing each data partition to a selected one of a plurality of mapping functions that are each user-configurable to independently output a plurality of lists of values for each of a set of keys found in such map function' ;

s corresponding data partition to form a first intermediate data set having a first set of resulting key-value pairs ;

for a second data set having a plurality of second key-value pairs , wherein such second data set belongs to a second data group and the second key-value pairs have a second schema , partitioning the first data set into a plurality of data partitions and providing each data partition to a selected one of a plurality of mapping functions to form a second intermediate data set having a second set (said server) of resulting key-value pairs differing from the first set of resulting key-value pairs , wherein the first schema differs from the second schema ;

and reducing the first and the second intermediate data set together so as to form an output data set , wherein the reducing is accomplished by iterating on at least one key from the first and second intermediate data set , and the output data set has a different schema than the first and second schema , wherein the first and second schema and the first and second set of resulting key-value pairs have a key in common .
US7103836B1
CLAIM 1
. A method for managing internet presentation materials in a single file format for ease of administration while presenting to a requestor only those portions of a file requested , comprising : defining , in a first portion (computing devices) of the file , a first variable for first information and a second variable for second information , where said first information and said second information are in the file ;
defining , in a second portion of the file , first and second presentation layouts , wherein said first presentation layout includes said first variable and said second presentation layout includes said second variable ;
and dynamically generating a page of presentation material in response to a request for said first information by using the first variable to extract said first information from the file and placing the first information in the page of presentation material , wherein the page is generated based on the first presentation layout and includes said first information and does not contain said second information .

US7103836B1
CLAIM 8
. A system for managing internet presentation materials in a single file format for ease of administration while presenting to a requestor only those portions of a file requested , comprising : a server for communicating with a client ;
a dynamic presentation page generating unit , coupled with said server (second set) ;
a macro program , coupled with said dynamic presentation page generation unit , having first and second presentation material stored therein , wherein said dynamic presentation page generating unit , in response to a message received by said server requesting said first presentation material , dynamically generates a page of presentation material based on the macro program by using the first variable to extract said first information from the file and placing the first information in the page of presentation material , the generated page including said first presentation material and not including said second presentation material .




US8190610B2

Filed: 2006-10-05     Issued: 2012-05-29

MapReduce for distributed database processing

(Original Assignee) Yahoo Inc     (Current Assignee) R2 Solutions LLC ; Altaba Inc

Ali Dasdan, Hung-Chih Yang, Ruey-Lung Hsiao
US6505187B1

Filed: 1999-12-08     Issued: 2003-01-07

Computing multiple order-based functions in a parallel processing database system

(Original Assignee) NCR Corp     (Current Assignee) Teradata US Inc

Ambuj Shatdal
US8190610B2
CLAIM 1
. A method of processing data of a data set over a distributed system , wherein the data set comprises a plurality of data groups , the method comprising : partitioning the data of each one of the data groups into a plurality of data partitions (different partition) that each have a plurality of key-value pairs and providing each data partition to a selected one of a plurality of mapping functions that are each user-configurable to independently output a plurality of lists of values for each of a set of keys found in such map function' ;

s corresponding data partition to form corresponding intermediate data for that data group and identifiable to that data group , wherein the data of a first data group has a different schema than the data of a second data group and the data of the first data group is mapped differently than the data of the second data group so that different lists of values are output for the corresponding different intermediate data , wherein the different schema and corresponding different intermediate data have a key in common ;

and reducing the intermediate data for the data groups to at least one output data group , including processing the intermediate data for each data group in a manner that is defined to correspond to that data group , so as to result in a merging of the corresponding different intermediate data based on the key in common , wherein the mapping and reducing operations are performed by a distributed system .
US6505187B1
CLAIM 16
. The method of claim 12 , further comprising re-partitioning the table using a different partition (data partitions) ing method for the table if the determining step was inaccurate .

US8190610B2
CLAIM 17
. A computer system including a plurality of computing devices , the computer system configured to process data of a data set , wherein the data set comprises a plurality of data groups , the computer system comprises at least one processor and memory that are operable to perform the following operations : partitioning the data of each one of the data groups into a plurality of data partitions (different partition) that each have a plurality of key-value pairs and providing each data partition to a selected one of a plurality of mapping functions that are each user-configurable to independently output a plurality of lists of values for each of a set of keys found in such map function' ;

s corresponding data partition to form corresponding intermediate data for that data group and identifiable to that data group , wherein the data of a first data group has a different schema than the data of a second data group and the data of the first data group is mapped differently than the data of the second data group so that different lists of values are output for the corresponding different intermediate data , wherein the different schema and corresponding different intermediate data have a key in common ;

and reduce the intermediate data for the data groups to at least one output data group , including processing the intermediate data for each data group in a manner that is defined to correspond to that data group so as to result in a merging of the corresponding different intermediate data based on the key in common .
US6505187B1
CLAIM 16
. The method of claim 12 , further comprising re-partitioning the table using a different partition (data partitions) ing method for the table if the determining step was inaccurate .

US8190610B2
CLAIM 33
. A map-reduce method of processing data from a plurality of groups having different schema over a computer system , the method comprising : for a first data set having a plurality of first key-value pairs , wherein such first data set belongs to a first data group and the first key-value pairs have a first schema , partitioning the first data set into a plurality of data partitions (different partition) and providing each data partition to a selected one of a plurality of mapping functions that are each user-configurable to independently output a plurality of lists of values for each of a set of keys found in such map function' ;

s corresponding data partition to form a first intermediate data set having a first set (first one) of resulting key-value pairs ;

for a second data set having a plurality of second key-value pairs , wherein such second data set belongs to a second data group and the second key-value pairs have a second schema , partitioning the first data set into a plurality of data partitions and providing each data partition to a selected one of a plurality of mapping functions to form a second intermediate data set having a second set (first one) of resulting key-value pairs differing from the first set of resulting key-value pairs , wherein the first schema differs from the second schema , wherein the first and second schema and the first and second set of resulting key-value pairs have a key in common ;

and reducing the first and the second intermediate data set together so as to form an output data set , wherein the reducing is accomplished by iterating on at least one key from the first and second intermediate data set , and the output data set has a different schema than the first and second schema , wherein the mapping and reducing operations are performed by a distributed system .
US6505187B1
CLAIM 6
. The method of claim 1 , wherein the determining step (a) comprises identifying order-specifications as being compatible when a first one (first set, second set) of the order-specifications is a left subset of a second one of the order-specifications .

US6505187B1
CLAIM 16
. The method of claim 12 , further comprising re-partitioning the table using a different partition (data partitions) ing method for the table if the determining step was inaccurate .

US8190610B2
CLAIM 40
. A computer system including a plurality of computing devices , the computer system configured to process data of a data set , wherein the data set comprises a plurality of data groups , the computer system comprises at least one processor and memory that are operable to perform the following operations : for a first data set having a plurality of first key-value pairs , wherein such first data set belongs to a first data group and the first key-value pairs have a first schema , partitioning the first data set into a plurality of data partitions (different partition) and providing each data partition to a selected one of a plurality of mapping functions that are each user-configurable to independently output a plurality of lists of values for each of a set of keys found in such map function' ;

s corresponding data partition to form a first intermediate data set having a first set (first one) of resulting key-value pairs ;

for a second data set having a plurality of second key-value pairs , wherein such second data set belongs to a second data group and the second key-value pairs have a second schema , partitioning the first data set into a plurality of data partitions and providing each data partition to a selected one of a plurality of mapping functions to form a second intermediate data set having a second set (first one) of resulting key-value pairs differing from the first set of resulting key-value pairs , wherein the first schema differs from the second schema ;

and reducing the first and the second intermediate data set together so as to form an output data set , wherein the reducing is accomplished by iterating on at least one key from the first and second intermediate data set , and the output data set has a different schema than the first and second schema , wherein the first and second schema and the first and second set of resulting key-value pairs have a key in common .
US6505187B1
CLAIM 6
. The method of claim 1 , wherein the determining step (a) comprises identifying order-specifications as being compatible when a first one (first set, second set) of the order-specifications is a left subset of a second one of the order-specifications .

US6505187B1
CLAIM 16
. The method of claim 12 , further comprising re-partitioning the table using a different partition (data partitions) ing method for the table if the determining step was inaccurate .




US8190610B2

Filed: 2006-10-05     Issued: 2012-05-29

MapReduce for distributed database processing

(Original Assignee) Yahoo Inc     (Current Assignee) R2 Solutions LLC ; Altaba Inc

Ali Dasdan, Hung-Chih Yang, Ruey-Lung Hsiao
US6735593B1

Filed: 1999-11-09     Issued: 2004-05-11

Systems and methods for storing data

(Original Assignee) ANSWERBRISK Ltd; LAZY SOFTWARE Ltd OF BIRCHES     (Current Assignee) SLIGO INNOVATIONS LLC

Simon Guy Williams
US8190610B2
CLAIM 33
. A map-reduce method of processing data from a plurality of groups having different schema over a computer system , the method comprising : for a first data set having a plurality of first key-value pairs , wherein such first data set belongs to a first data group and the first key-value pairs have a first schema (defines one) , partitioning the first data set into a plurality of data partitions and providing each data partition to a selected one of a plurality of mapping functions that are each user-configurable to independently output a plurality of lists of values for each of a set of keys found in such map function' ;

s corresponding data partition to form a first intermediate data set having a first set of resulting key-value pairs ;

for a second data set having a plurality of second key-value pairs , wherein such second data set belongs to a second data group and the second key-value pairs have a second schema , partitioning the first data set into a plurality of data partitions and providing each data partition to a selected one of a plurality of mapping functions to form a second intermediate data set having a second set (including one) of resulting key-value pairs differing from the first set of resulting key-value pairs , wherein the first schema differs from the second schema , wherein the first and second schema and the first and second set of resulting key-value pairs have a key in common ;

and reducing the first and the second intermediate data set together so as to form an output data set , wherein the reducing is accomplished by iterating on at least one key from the first and second intermediate data set , and the output data set has a different schema than the first and second schema , wherein the mapping and reducing operations are performed by a distributed system .
US6735593B1
CLAIM 1
. A system for storing data , comprising a plurality of entities , each entity representing an item having a discrete , independent existence , and a plurality of associations , defining nexuses between pairs of said entities , nexuses between pairs including one (second set) of said entities and one of said plurality of associations , and nexuses between pairs of said associations .

US6735593B1
CLAIM 3
. The system of claim 1 , wherein the plurality of associations includes a correcting association which defines one (first schema) nexus as correcting a previous nexus .

US8190610B2
CLAIM 40
. A computer system including a plurality of computing devices , the computer system configured to process data of a data set , wherein the data set comprises a plurality of data groups , the computer system comprises at least one processor and memory that are operable to perform the following operations : for a first data set having a plurality of first key-value pairs , wherein such first data set belongs to a first data group and the first key-value pairs have a first schema (defines one) , partitioning the first data set into a plurality of data partitions and providing each data partition to a selected one of a plurality of mapping functions that are each user-configurable to independently output a plurality of lists of values for each of a set of keys found in such map function' ;

s corresponding data partition to form a first intermediate data set having a first set of resulting key-value pairs ;

for a second data set having a plurality of second key-value pairs , wherein such second data set belongs to a second data group and the second key-value pairs have a second schema , partitioning the first data set into a plurality of data partitions and providing each data partition to a selected one of a plurality of mapping functions to form a second intermediate data set having a second set (including one) of resulting key-value pairs differing from the first set of resulting key-value pairs , wherein the first schema differs from the second schema ;

and reducing the first and the second intermediate data set together so as to form an output data set , wherein the reducing is accomplished by iterating on at least one key from the first and second intermediate data set , and the output data set has a different schema than the first and second schema , wherein the first and second schema and the first and second set of resulting key-value pairs have a key in common .
US6735593B1
CLAIM 1
. A system for storing data , comprising a plurality of entities , each entity representing an item having a discrete , independent existence , and a plurality of associations , defining nexuses between pairs of said entities , nexuses between pairs including one (second set) of said entities and one of said plurality of associations , and nexuses between pairs of said associations .

US6735593B1
CLAIM 3
. The system of claim 1 , wherein the plurality of associations includes a correcting association which defines one (first schema) nexus as correcting a previous nexus .




US8190610B2

Filed: 2006-10-05     Issued: 2012-05-29

MapReduce for distributed database processing

(Original Assignee) Yahoo Inc     (Current Assignee) R2 Solutions LLC ; Altaba Inc

Ali Dasdan, Hung-Chih Yang, Ruey-Lung Hsiao
US6609131B1

Filed: 1999-09-27     Issued: 2003-08-19

Parallel partition-wise joins

(Original Assignee) Oracle International Corp     (Current Assignee) Oracle International Corp

Mohamed Zait, Beniot Dageville
US8190610B2
CLAIM 1
. A method of processing data of a data set over a distributed system , wherein the data set comprises a plurality of data groups , the method comprising : partitioning the data of each one of the data groups into a plurality of data partitions (first partition, second subsets) that each have a plurality of key-value pairs and providing each data partition to a selected one of a plurality of mapping functions that are each user-configurable to independently output a plurality of lists of values for each of a set of keys found in such map function' ;

s corresponding data partition to form corresponding intermediate data for that data group and identifiable to that data group , wherein the data of a first data group has a different schema than the data of a second data group and the data of the first data group is mapped differently than the data of the second data group so that different lists of values are output for the corresponding different intermediate data , wherein the different schema and corresponding different intermediate data have a key in common ;

and reducing the intermediate data for the data groups to at least one output data group , including processing the intermediate data for each data group in a manner that is defined to correspond to that data group , so as to result in a merging of the corresponding different intermediate data based on the key in common , wherein the mapping and reducing operations are performed by a distributed system .
US6609131B1
CLAIM 1
. A method for processing a statement that specifies a join between a first object and a second object based on a join key , the method comprising the steps of : performing multi-level static partitioning of the first object , including performing the steps of : statically partitioning the first object at a first level by applying a first partition (data partitions, partitioning step) ing criteria to the first object to produce a first set of partitions ;
and statically partitioning the first object at a second level based on said join key , wherein the static partitioning based on said join key is performed by applying a second partitioning criteria to each partition in said first set of partitions to produce a second set of partitions ;
during execution of said statement , using the second level of static partitioning of said first object as a basis for distributing data of said first object , including performing the steps of : inspecting partitioning metadata based on said statement to determine that the second level of static partitioning of said first object , and not said first level of static partitioning , should be used as a basis for distributing data from said first object to a plurality of slave processes ;
distributing data from said first and second objects to each slave process of said plurality of slave processes based on said second partitioning criteria , including distributing to each slave process of said plurality of slave processes a first subset of data from said first object , wherein the first subset of data that is distributed to each slave process is established based on said second set of partitions ;
distributing to each slave process of said plurality of slave processes a second subset of data from said second object , wherein the second subset of data that is distributed to each slave process is established based on said second partitioning criteria ;
and causing each slave process of said plurality of slave processes to perform a join between the first subset of data assigned to the slave process and the second subset of data assigned to the slave process .

US6609131B1
CLAIM 2
. The method of claim 1 wherein : the second object is statically partitioned based on the second partitioning criteria ;
and no dynamic partitioning is performed to establish the first and second subsets (data partitions, partitioning step) of data that are distributed to each slave process .

US8190610B2
CLAIM 8
. The method of claim 7 , wherein : in the partitioning step (first partition, second subsets) , all of the key/value pairs of the intermediate data are provided to all of the partitions .
US6609131B1
CLAIM 1
. A method for processing a statement that specifies a join between a first object and a second object based on a join key , the method comprising the steps of : performing multi-level static partitioning of the first object , including performing the steps of : statically partitioning the first object at a first level by applying a first partition (data partitions, partitioning step) ing criteria to the first object to produce a first set of partitions ;
and statically partitioning the first object at a second level based on said join key , wherein the static partitioning based on said join key is performed by applying a second partitioning criteria to each partition in said first set of partitions to produce a second set of partitions ;
during execution of said statement , using the second level of static partitioning of said first object as a basis for distributing data of said first object , including performing the steps of : inspecting partitioning metadata based on said statement to determine that the second level of static partitioning of said first object , and not said first level of static partitioning , should be used as a basis for distributing data from said first object to a plurality of slave processes ;
distributing data from said first and second objects to each slave process of said plurality of slave processes based on said second partitioning criteria , including distributing to each slave process of said plurality of slave processes a first subset of data from said first object , wherein the first subset of data that is distributed to each slave process is established based on said second set of partitions ;
distributing to each slave process of said plurality of slave processes a second subset of data from said second object , wherein the second subset of data that is distributed to each slave process is established based on said second partitioning criteria ;
and causing each slave process of said plurality of slave processes to perform a join between the first subset of data assigned to the slave process and the second subset of data assigned to the slave process .

US6609131B1
CLAIM 2
. The method of claim 1 wherein : the second object is statically partitioned based on the second partitioning criteria ;
and no dynamic partitioning is performed to establish the first and second subsets (data partitions, partitioning step) of data that are distributed to each slave process .

US8190610B2
CLAIM 17
. A computer system including a plurality of computing devices , the computer system configured to process data of a data set , wherein the data set comprises a plurality of data groups , the computer system comprises at least one processor and memory that are operable to perform the following operations : partitioning the data of each one of the data groups into a plurality of data partitions (first partition, second subsets) that each have a plurality of key-value pairs and providing each data partition to a selected one of a plurality of mapping functions that are each user-configurable to independently output a plurality of lists of values for each of a set of keys found in such map function' ;

s corresponding data partition to form corresponding intermediate data for that data group and identifiable to that data group , wherein the data of a first data group has a different schema than the data of a second data group and the data of the first data group is mapped differently than the data of the second data group so that different lists of values are output for the corresponding different intermediate data , wherein the different schema and corresponding different intermediate data have a key in common ;

and reduce the intermediate data for the data groups to at least one output data group , including processing the intermediate data for each data group in a manner that is defined to correspond to that data group so as to result in a merging of the corresponding different intermediate data based on the key in common .
US6609131B1
CLAIM 1
. A method for processing a statement that specifies a join between a first object and a second object based on a join key , the method comprising the steps of : performing multi-level static partitioning of the first object , including performing the steps of : statically partitioning the first object at a first level by applying a first partition (data partitions, partitioning step) ing criteria to the first object to produce a first set of partitions ;
and statically partitioning the first object at a second level based on said join key , wherein the static partitioning based on said join key is performed by applying a second partitioning criteria to each partition in said first set of partitions to produce a second set of partitions ;
during execution of said statement , using the second level of static partitioning of said first object as a basis for distributing data of said first object , including performing the steps of : inspecting partitioning metadata based on said statement to determine that the second level of static partitioning of said first object , and not said first level of static partitioning , should be used as a basis for distributing data from said first object to a plurality of slave processes ;
distributing data from said first and second objects to each slave process of said plurality of slave processes based on said second partitioning criteria , including distributing to each slave process of said plurality of slave processes a first subset of data from said first object , wherein the first subset of data that is distributed to each slave process is established based on said second set of partitions ;
distributing to each slave process of said plurality of slave processes a second subset of data from said second object , wherein the second subset of data that is distributed to each slave process is established based on said second partitioning criteria ;
and causing each slave process of said plurality of slave processes to perform a join between the first subset of data assigned to the slave process and the second subset of data assigned to the slave process .

US6609131B1
CLAIM 2
. The method of claim 1 wherein : the second object is statically partitioned based on the second partitioning criteria ;
and no dynamic partitioning is performed to establish the first and second subsets (data partitions, partitioning step) of data that are distributed to each slave process .

US8190610B2
CLAIM 33
. A map-reduce method of processing data from a plurality of groups having different schema over a computer system , the method comprising : for a first data set having a plurality of first key-value pairs , wherein such first data set belongs to a first data group and the first key-value pairs have a first schema , partitioning the first data set into a plurality of data partitions (first partition, second subsets) and providing each data partition to a selected one of a plurality of mapping functions that are each user-configurable to independently output a plurality of lists of values for each of a set of keys found in such map function' ;

s corresponding data partition to form a first intermediate data set (said first set) having a first set of resulting key-value pairs ;

for a second data set having a plurality of second key-value pairs , wherein such second data set belongs to a second data group and the second key-value pairs have a second schema , partitioning the first data set into a plurality of data partitions and providing each data partition to a selected one of a plurality of mapping functions to form a second intermediate data set having a second set (second set) of resulting key-value pairs differing from the first set of resulting key-value pairs , wherein the first schema differs from the second schema , wherein the first and second schema and the first and second set of resulting key-value pairs have a key in common ;

and reducing the first and the second intermediate data set together so as to form an output data set , wherein the reducing is accomplished by iterating on at least one key from the first and second intermediate data set , and the output data set has a different schema than the first and second schema , wherein the mapping and reducing operations are performed by a distributed system .
US6609131B1
CLAIM 1
. A method for processing a statement that specifies a join between a first object and a second object based on a join key , the method comprising the steps of : performing multi-level static partitioning of the first object , including performing the steps of : statically partitioning the first object at a first level by applying a first partition (data partitions, partitioning step) ing criteria to the first object to produce a first set of partitions ;
and statically partitioning the first object at a second level based on said join key , wherein the static partitioning based on said join key is performed by applying a second partitioning criteria to each partition in said first set (intermediate data set) of partitions to produce a second set (second set) of partitions ;
during execution of said statement , using the second level of static partitioning of said first object as a basis for distributing data of said first object , including performing the steps of : inspecting partitioning metadata based on said statement to determine that the second level of static partitioning of said first object , and not said first level of static partitioning , should be used as a basis for distributing data from said first object to a plurality of slave processes ;
distributing data from said first and second objects to each slave process of said plurality of slave processes based on said second partitioning criteria , including distributing to each slave process of said plurality of slave processes a first subset of data from said first object , wherein the first subset of data that is distributed to each slave process is established based on said second set of partitions ;
distributing to each slave process of said plurality of slave processes a second subset of data from said second object , wherein the second subset of data that is distributed to each slave process is established based on said second partitioning criteria ;
and causing each slave process of said plurality of slave processes to perform a join between the first subset of data assigned to the slave process and the second subset of data assigned to the slave process .

US6609131B1
CLAIM 2
. The method of claim 1 wherein : the second object is statically partitioned based on the second partitioning criteria ;
and no dynamic partitioning is performed to establish the first and second subsets (data partitions, partitioning step) of data that are distributed to each slave process .

US8190610B2
CLAIM 34
. The map-reduce method of claim 33 , wherein the reducing is accomplished by iterating on at least one key from each of the first and second intermediate data set (said first set) so that the output data set is a merging of a portion of the first and second intermediate data set .
US6609131B1
CLAIM 1
. A method for processing a statement that specifies a join between a first object and a second object based on a join key , the method comprising the steps of : performing multi-level static partitioning of the first object , including performing the steps of : statically partitioning the first object at a first level by applying a first partitioning criteria to the first object to produce a first set of partitions ;
and statically partitioning the first object at a second level based on said join key , wherein the static partitioning based on said join key is performed by applying a second partitioning criteria to each partition in said first set (intermediate data set) of partitions to produce a second set of partitions ;
during execution of said statement , using the second level of static partitioning of said first object as a basis for distributing data of said first object , including performing the steps of : inspecting partitioning metadata based on said statement to determine that the second level of static partitioning of said first object , and not said first level of static partitioning , should be used as a basis for distributing data from said first object to a plurality of slave processes ;
distributing data from said first and second objects to each slave process of said plurality of slave processes based on said second partitioning criteria , including distributing to each slave process of said plurality of slave processes a first subset of data from said first object , wherein the first subset of data that is distributed to each slave process is established based on said second set of partitions ;
distributing to each slave process of said plurality of slave processes a second subset of data from said second object , wherein the second subset of data that is distributed to each slave process is established based on said second partitioning criteria ;
and causing each slave process of said plurality of slave processes to perform a join between the first subset of data assigned to the slave process and the second subset of data assigned to the slave process .

US8190610B2
CLAIM 35
. The map-reduce method of claim 33 , wherein the reducing is accomplished by a plurality of reducers , the method further comprises : partitioning the first or second intermediate data set (said first set) into a plurality of partitions , at least some of the key-value pairs of the first or second intermediate data being provided to more than one of the partitions ;

and providing the partitioned intermediate data set of each partition to a separate one of the reducers .
US6609131B1
CLAIM 1
. A method for processing a statement that specifies a join between a first object and a second object based on a join key , the method comprising the steps of : performing multi-level static partitioning of the first object , including performing the steps of : statically partitioning the first object at a first level by applying a first partitioning criteria to the first object to produce a first set of partitions ;
and statically partitioning the first object at a second level based on said join key , wherein the static partitioning based on said join key is performed by applying a second partitioning criteria to each partition in said first set (intermediate data set) of partitions to produce a second set of partitions ;
during execution of said statement , using the second level of static partitioning of said first object as a basis for distributing data of said first object , including performing the steps of : inspecting partitioning metadata based on said statement to determine that the second level of static partitioning of said first object , and not said first level of static partitioning , should be used as a basis for distributing data from said first object to a plurality of slave processes ;
distributing data from said first and second objects to each slave process of said plurality of slave processes based on said second partitioning criteria , including distributing to each slave process of said plurality of slave processes a first subset of data from said first object , wherein the first subset of data that is distributed to each slave process is established based on said second set of partitions ;
distributing to each slave process of said plurality of slave processes a second subset of data from said second object , wherein the second subset of data that is distributed to each slave process is established based on said second partitioning criteria ;
and causing each slave process of said plurality of slave processes to perform a join between the first subset of data assigned to the slave process and the second subset of data assigned to the slave process .

US8190610B2
CLAIM 36
. The map-reduce method of claim 35 , wherein all of the key-value pairs of the first and second intermediate data set (said first set) are provided to all of the reducers .
US6609131B1
CLAIM 1
. A method for processing a statement that specifies a join between a first object and a second object based on a join key , the method comprising the steps of : performing multi-level static partitioning of the first object , including performing the steps of : statically partitioning the first object at a first level by applying a first partitioning criteria to the first object to produce a first set of partitions ;
and statically partitioning the first object at a second level based on said join key , wherein the static partitioning based on said join key is performed by applying a second partitioning criteria to each partition in said first set (intermediate data set) of partitions to produce a second set of partitions ;
during execution of said statement , using the second level of static partitioning of said first object as a basis for distributing data of said first object , including performing the steps of : inspecting partitioning metadata based on said statement to determine that the second level of static partitioning of said first object , and not said first level of static partitioning , should be used as a basis for distributing data from said first object to a plurality of slave processes ;
distributing data from said first and second objects to each slave process of said plurality of slave processes based on said second partitioning criteria , including distributing to each slave process of said plurality of slave processes a first subset of data from said first object , wherein the first subset of data that is distributed to each slave process is established based on said second set of partitions ;
distributing to each slave process of said plurality of slave processes a second subset of data from said second object , wherein the second subset of data that is distributed to each slave process is established based on said second partitioning criteria ;
and causing each slave process of said plurality of slave processes to perform a join between the first subset of data assigned to the slave process and the second subset of data assigned to the slave process .

US8190610B2
CLAIM 40
. A computer system including a plurality of computing devices , the computer system configured to process data of a data set , wherein the data set comprises a plurality of data groups , the computer system comprises at least one processor and memory that are operable to perform the following operations : for a first data set having a plurality of first key-value pairs , wherein such first data set belongs to a first data group and the first key-value pairs have a first schema , partitioning the first data set into a plurality of data partitions (first partition, second subsets) and providing each data partition to a selected one of a plurality of mapping functions that are each user-configurable to independently output a plurality of lists of values for each of a set of keys found in such map function' ;

s corresponding data partition to form a first intermediate data set (said first set) having a first set of resulting key-value pairs ;

for a second data set having a plurality of second key-value pairs , wherein such second data set belongs to a second data group and the second key-value pairs have a second schema , partitioning the first data set into a plurality of data partitions and providing each data partition to a selected one of a plurality of mapping functions to form a second intermediate data set having a second set (second set) of resulting key-value pairs differing from the first set of resulting key-value pairs , wherein the first schema differs from the second schema ;

and reducing the first and the second intermediate data set together so as to form an output data set , wherein the reducing is accomplished by iterating on at least one key from the first and second intermediate data set , and the output data set has a different schema than the first and second schema , wherein the first and second schema and the first and second set of resulting key-value pairs have a key in common .
US6609131B1
CLAIM 1
. A method for processing a statement that specifies a join between a first object and a second object based on a join key , the method comprising the steps of : performing multi-level static partitioning of the first object , including performing the steps of : statically partitioning the first object at a first level by applying a first partition (data partitions, partitioning step) ing criteria to the first object to produce a first set of partitions ;
and statically partitioning the first object at a second level based on said join key , wherein the static partitioning based on said join key is performed by applying a second partitioning criteria to each partition in said first set (intermediate data set) of partitions to produce a second set (second set) of partitions ;
during execution of said statement , using the second level of static partitioning of said first object as a basis for distributing data of said first object , including performing the steps of : inspecting partitioning metadata based on said statement to determine that the second level of static partitioning of said first object , and not said first level of static partitioning , should be used as a basis for distributing data from said first object to a plurality of slave processes ;
distributing data from said first and second objects to each slave process of said plurality of slave processes based on said second partitioning criteria , including distributing to each slave process of said plurality of slave processes a first subset of data from said first object , wherein the first subset of data that is distributed to each slave process is established based on said second set of partitions ;
distributing to each slave process of said plurality of slave processes a second subset of data from said second object , wherein the second subset of data that is distributed to each slave process is established based on said second partitioning criteria ;
and causing each slave process of said plurality of slave processes to perform a join between the first subset of data assigned to the slave process and the second subset of data assigned to the slave process .

US6609131B1
CLAIM 2
. The method of claim 1 wherein : the second object is statically partitioned based on the second partitioning criteria ;
and no dynamic partitioning is performed to establish the first and second subsets (data partitions, partitioning step) of data that are distributed to each slave process .

US8190610B2
CLAIM 41
. The computer system of claim 40 , wherein the reducing is accomplished by iterating on at least one key from each of the first and second intermediate data set (said first set) so that the output data set is a merging of a portion of the first and second intermediate data set .
US6609131B1
CLAIM 1
. A method for processing a statement that specifies a join between a first object and a second object based on a join key , the method comprising the steps of : performing multi-level static partitioning of the first object , including performing the steps of : statically partitioning the first object at a first level by applying a first partitioning criteria to the first object to produce a first set of partitions ;
and statically partitioning the first object at a second level based on said join key , wherein the static partitioning based on said join key is performed by applying a second partitioning criteria to each partition in said first set (intermediate data set) of partitions to produce a second set of partitions ;
during execution of said statement , using the second level of static partitioning of said first object as a basis for distributing data of said first object , including performing the steps of : inspecting partitioning metadata based on said statement to determine that the second level of static partitioning of said first object , and not said first level of static partitioning , should be used as a basis for distributing data from said first object to a plurality of slave processes ;
distributing data from said first and second objects to each slave process of said plurality of slave processes based on said second partitioning criteria , including distributing to each slave process of said plurality of slave processes a first subset of data from said first object , wherein the first subset of data that is distributed to each slave process is established based on said second set of partitions ;
distributing to each slave process of said plurality of slave processes a second subset of data from said second object , wherein the second subset of data that is distributed to each slave process is established based on said second partitioning criteria ;
and causing each slave process of said plurality of slave processes to perform a join between the first subset of data assigned to the slave process and the second subset of data assigned to the slave process .

US8190610B2
CLAIM 42
. The computer system of claim 40 , wherein the reducing is accomplished by a plurality of reducers , the d the at least one processor and memory further operable for : partitioning the first or second intermediate data set (said first set) into a plurality of partitions , at least some of the key-value pairs of the first or second intermediate data being provided to more than one of the partitions ;

and providing the partitioned intermediate data set of each partition to a separate one of the reducers .
US6609131B1
CLAIM 1
. A method for processing a statement that specifies a join between a first object and a second object based on a join key , the method comprising the steps of : performing multi-level static partitioning of the first object , including performing the steps of : statically partitioning the first object at a first level by applying a first partitioning criteria to the first object to produce a first set of partitions ;
and statically partitioning the first object at a second level based on said join key , wherein the static partitioning based on said join key is performed by applying a second partitioning criteria to each partition in said first set (intermediate data set) of partitions to produce a second set of partitions ;
during execution of said statement , using the second level of static partitioning of said first object as a basis for distributing data of said first object , including performing the steps of : inspecting partitioning metadata based on said statement to determine that the second level of static partitioning of said first object , and not said first level of static partitioning , should be used as a basis for distributing data from said first object to a plurality of slave processes ;
distributing data from said first and second objects to each slave process of said plurality of slave processes based on said second partitioning criteria , including distributing to each slave process of said plurality of slave processes a first subset of data from said first object , wherein the first subset of data that is distributed to each slave process is established based on said second set of partitions ;
distributing to each slave process of said plurality of slave processes a second subset of data from said second object , wherein the second subset of data that is distributed to each slave process is established based on said second partitioning criteria ;
and causing each slave process of said plurality of slave processes to perform a join between the first subset of data assigned to the slave process and the second subset of data assigned to the slave process .

US8190610B2
CLAIM 43
. The computer system of claim 42 , wherein all of the key-value pairs of the first and second intermediate data set (said first set) are provided to all of the reducers .
US6609131B1
CLAIM 1
. A method for processing a statement that specifies a join between a first object and a second object based on a join key , the method comprising the steps of : performing multi-level static partitioning of the first object , including performing the steps of : statically partitioning the first object at a first level by applying a first partitioning criteria to the first object to produce a first set of partitions ;
and statically partitioning the first object at a second level based on said join key , wherein the static partitioning based on said join key is performed by applying a second partitioning criteria to each partition in said first set (intermediate data set) of partitions to produce a second set of partitions ;
during execution of said statement , using the second level of static partitioning of said first object as a basis for distributing data of said first object , including performing the steps of : inspecting partitioning metadata based on said statement to determine that the second level of static partitioning of said first object , and not said first level of static partitioning , should be used as a basis for distributing data from said first object to a plurality of slave processes ;
distributing data from said first and second objects to each slave process of said plurality of slave processes based on said second partitioning criteria , including distributing to each slave process of said plurality of slave processes a first subset of data from said first object , wherein the first subset of data that is distributed to each slave process is established based on said second set of partitions ;
distributing to each slave process of said plurality of slave processes a second subset of data from said second object , wherein the second subset of data that is distributed to each slave process is established based on said second partitioning criteria ;
and causing each slave process of said plurality of slave processes to perform a join between the first subset of data assigned to the slave process and the second subset of data assigned to the slave process .




US8190610B2

Filed: 2006-10-05     Issued: 2012-05-29

MapReduce for distributed database processing

(Original Assignee) Yahoo Inc     (Current Assignee) R2 Solutions LLC ; Altaba Inc

Ali Dasdan, Hung-Chih Yang, Ruey-Lung Hsiao
US6446048B1

Filed: 1999-09-03     Issued: 2002-09-03

Web-based entry of financial transaction information and subsequent download of such information

(Original Assignee) Intuit Inc     (Current Assignee) Intuit Inc

Michael L. Wells, Joseph W. Wells
US8190610B2
CLAIM 1
. A method of processing data of a data set over a distributed system , wherein the data set comprises a plurality of data groups , the method comprising : partitioning the data of each one of the data groups into a plurality of data partitions that each have a plurality of key-value pairs and providing each data partition (sending information) to a selected one of a plurality of mapping functions that are each user-configurable to independently output a plurality of lists of values for each of a set of keys found in such map function' ;

s corresponding data partition to form corresponding intermediate data for that data group and identifiable to that data group , wherein the data of a first data group (sending information) has a different schema than the data of a second data group and the data of the first data group is mapped differently than the data of the second data group so that different lists of values are output for the corresponding different intermediate data , wherein the different schema and corresponding different intermediate data have a key in common ;

and reducing the intermediate data for the data groups to at least one output data group , including processing the intermediate data for each data group in a manner that is defined to correspond to that data group , so as to result in a merging of the corresponding different intermediate data based on the key in common , wherein the mapping and reducing operations are performed by a distributed system .
US6446048B1
CLAIM 17
. The system of claim 16 wherein the client computers and input devices provide , modify and/or retrieve the financial information by sending information (data partition, first data group, s corresponding data partition to form corresponding intermediate data, s corresponding data partition) -exchange-requests to the web-site , each request being one of (a) a request to store information , (b) a request to modify previously stored information , (c) a request to delete previously stored information , or (d) a request to retrieve previously stored information .

US8190610B2
CLAIM 17
. A computer system including a plurality of computing devices , the computer system configured to process data of a data set , wherein the data set comprises a plurality of data groups , the computer system comprises at least one processor and memory that are operable to perform the following operations : partitioning the data of each one of the data groups into a plurality of data partitions that each have a plurality of key-value pairs and providing each data partition (sending information) to a selected one of a plurality of mapping functions that are each user-configurable to independently output a plurality of lists of values for each of a set of keys found in such map function' ;

s corresponding data partition to form corresponding intermediate data for that data group and identifiable to that data group , wherein the data of a first data group (sending information) has a different schema than the data of a second data group and the data of the first data group is mapped differently than the data of the second data group so that different lists of values are output for the corresponding different intermediate data , wherein the different schema and corresponding different intermediate data have a key in common ;

and reduce the intermediate data for the data groups to at least one output data group , including processing the intermediate data for each data group in a manner that is defined to correspond to that data group so as to result in a merging of the corresponding different intermediate data based on the key in common .
US6446048B1
CLAIM 17
. The system of claim 16 wherein the client computers and input devices provide , modify and/or retrieve the financial information by sending information (data partition, first data group, s corresponding data partition to form corresponding intermediate data, s corresponding data partition) -exchange-requests to the web-site , each request being one of (a) a request to store information , (b) a request to modify previously stored information , (c) a request to delete previously stored information , or (d) a request to retrieve previously stored information .

US8190610B2
CLAIM 33
. A map-reduce method of processing data from a plurality of groups having different schema over a computer system , the method comprising : for a first data set having a plurality of first key-value pairs , wherein such first data set belongs to a first data group (sending information) and the first key-value pairs have a first schema , partitioning the first data set into a plurality of data partitions and providing each data partition (sending information) to a selected one of a plurality of mapping functions that are each user-configurable to independently output a plurality of lists of values for each of a set of keys found in such map function' ;

s corresponding data partition to form a first intermediate data set having a first set of resulting key-value pairs ;

for a second data set having a plurality of second key-value pairs , wherein such second data set belongs to a second data group and the second key-value pairs have a second schema , partitioning the first data set into a plurality of data partitions and providing each data partition to a selected one of a plurality of mapping functions to form a second intermediate data set having a second set of resulting key-value pairs differing from the first set of resulting key-value pairs , wherein the first schema differs from the second schema , wherein the first and second schema and the first and second set of resulting key-value pairs have a key in common ;

and reducing the first and the second intermediate data set together so as to form an output data set , wherein the reducing is accomplished by iterating on at least one key from the first and second intermediate data set , and the output data set has a different schema than the first and second schema , wherein the mapping and reducing operations are performed by a distributed system .
US6446048B1
CLAIM 17
. The system of claim 16 wherein the client computers and input devices provide , modify and/or retrieve the financial information by sending information (data partition, first data group, s corresponding data partition to form corresponding intermediate data, s corresponding data partition) -exchange-requests to the web-site , each request being one of (a) a request to store information , (b) a request to modify previously stored information , (c) a request to delete previously stored information , or (d) a request to retrieve previously stored information .

US8190610B2
CLAIM 40
. A computer system including a plurality of computing devices , the computer system configured to process data of a data set , wherein the data set comprises a plurality of data groups , the computer system comprises at least one processor and memory that are operable to perform the following operations : for a first data set having a plurality of first key-value pairs , wherein such first data set belongs to a first data group (sending information) and the first key-value pairs have a first schema , partitioning the first data set into a plurality of data partitions and providing each data partition (sending information) to a selected one of a plurality of mapping functions that are each user-configurable to independently output a plurality of lists of values for each of a set of keys found in such map function' ;

s corresponding data partition to form a first intermediate data set having a first set of resulting key-value pairs ;

for a second data set having a plurality of second key-value pairs , wherein such second data set belongs to a second data group and the second key-value pairs have a second schema , partitioning the first data set into a plurality of data partitions and providing each data partition to a selected one of a plurality of mapping functions to form a second intermediate data set having a second set of resulting key-value pairs differing from the first set of resulting key-value pairs , wherein the first schema differs from the second schema ;

and reducing the first and the second intermediate data set together so as to form an output data set , wherein the reducing is accomplished by iterating on at least one key from the first and second intermediate data set , and the output data set has a different schema than the first and second schema , wherein the first and second schema and the first and second set of resulting key-value pairs have a key in common .
US6446048B1
CLAIM 17
. The system of claim 16 wherein the client computers and input devices provide , modify and/or retrieve the financial information by sending information (data partition, first data group, s corresponding data partition to form corresponding intermediate data, s corresponding data partition) -exchange-requests to the web-site , each request being one of (a) a request to store information , (b) a request to modify previously stored information , (c) a request to delete previously stored information , or (d) a request to retrieve previously stored information .




US8190610B2

Filed: 2006-10-05     Issued: 2012-05-29

MapReduce for distributed database processing

(Original Assignee) Yahoo Inc     (Current Assignee) R2 Solutions LLC ; Altaba Inc

Ali Dasdan, Hung-Chih Yang, Ruey-Lung Hsiao
US6330653B1

Filed: 1999-04-30     Issued: 2001-12-11

Manipulation of virtual and live computer storage device partitions

(Original Assignee) Powerquest Corp     (Current Assignee) Veritas Technologies LLC

Golden E. Murray, David I. Marsh, Robert S. Raymond, Troy Millett, Damon Janis, Russell J. Marsh, Paul E. Madden
US8190610B2
CLAIM 1
. A method of processing data of a data set over a distributed system , wherein the data set comprises a plurality of data groups , the method comprising : partitioning the data of each one of the data groups into a plurality of data partitions that each have a plurality of key-value pairs and providing each data partition (simulation result, allocation table) to a selected one of a plurality of mapping functions that are each user-configurable to independently output a plurality of lists of values for each of a set of keys found in such map function' ;

s corresponding data partition to form corresponding intermediate data for that data group (simulation result, allocation table) and identifiable to that data group , wherein the data of a first data group has a different schema than the data of a second data group and the data of the first data group is mapped differently than the data of the second data group so that different lists of values are output for the corresponding different intermediate data , wherein the different schema and corresponding different intermediate data have a key in common ;

and reducing the intermediate data for the data groups to at least one output data group , including processing the intermediate data for each data group in a manner that is defined to correspond to that data group , so as to result in a merging of the corresponding different intermediate data based on the key in common , wherein the mapping and reducing operations are performed by a distributed system .
US6330653B1
CLAIM 1
. A method for computer partition manipulation , comprising the steps of : obtaining from a user at least one user command , each user command corresponding to at least one desired operation on a virtual partition , each virtual partition corresponding at least initially to a live partition on a computer storage medium , each virtual partition having a specified starting sector address and a specified ending sector address , and also having an indication whether the virtual partition is formatted to a file system ;
simulating the performance of at least one of the desired partition operations to obtain simulation result (particular data, data partition, data group, particular data group) s by manipulating the virtual partition without necessarily modifying on-disk system structures that specify the live partition ;
and providing the simulation results to the user .

US6330653B1
CLAIM 50
. The method of claim 46 , wherein the optimized list specifies free space and then c (second data group) reates an operating system partition for installing an operating system .

US6330653B1
CLAIM 62
. The configured program storage medium of claim 60 , wherein the data structure on the non-volatile storage device includes a file allocation table (particular data, data partition, data group, particular data group) .

US8190610B2
CLAIM 2
. The method of claim 1 , wherein : the at least one output data group (simulation result, allocation table) is a plurality of output data groups (free space) .
US6330653B1
CLAIM 1
. A method for computer partition manipulation , comprising the steps of : obtaining from a user at least one user command , each user command corresponding to at least one desired operation on a virtual partition , each virtual partition corresponding at least initially to a live partition on a computer storage medium , each virtual partition having a specified starting sector address and a specified ending sector address , and also having an indication whether the virtual partition is formatted to a file system ;
simulating the performance of at least one of the desired partition operations to obtain simulation result (particular data, data partition, data group, particular data group) s by manipulating the virtual partition without necessarily modifying on-disk system structures that specify the live partition ;
and providing the simulation results to the user .

US6330653B1
CLAIM 25
. The method of claim 1 , wherein the simulating step includes simulated partition resizing by expanding a simulated extended partition into free space (output data groups) obtained by manipulating a simulated partition outside the simulated extended partition .

US6330653B1
CLAIM 62
. The configured program storage medium of claim 60 , wherein the data structure on the non-volatile storage device includes a file allocation table (particular data, data partition, data group, particular data group) .

US8190610B2
CLAIM 3
. The method of claim 1 , wherein : corresponding intermediate data for a data group (simulation result, allocation table) being identifiable to the data group includes the corresponding intermediate data being stored such that metadata associated with a manner of storage is identifiable to the data group .
US6330653B1
CLAIM 1
. A method for computer partition manipulation , comprising the steps of : obtaining from a user at least one user command , each user command corresponding to at least one desired operation on a virtual partition , each virtual partition corresponding at least initially to a live partition on a computer storage medium , each virtual partition having a specified starting sector address and a specified ending sector address , and also having an indication whether the virtual partition is formatted to a file system ;
simulating the performance of at least one of the desired partition operations to obtain simulation result (particular data, data partition, data group, particular data group) s by manipulating the virtual partition without necessarily modifying on-disk system structures that specify the live partition ;
and providing the simulation results to the user .

US6330653B1
CLAIM 62
. The configured program storage medium of claim 60 , wherein the data structure on the non-volatile storage device includes a file allocation table (particular data, data partition, data group, particular data group) .

US8190610B2
CLAIM 4
. The method of claim 1 , wherein : corresponding intermediate data for a data group (simulation result, allocation table) being identifiable to the data group includes the corresponding intermediate data being stored in association with data that is identifiable to the data group .
US6330653B1
CLAIM 1
. A method for computer partition manipulation , comprising the steps of : obtaining from a user at least one user command , each user command corresponding to at least one desired operation on a virtual partition , each virtual partition corresponding at least initially to a live partition on a computer storage medium , each virtual partition having a specified starting sector address and a specified ending sector address , and also having an indication whether the virtual partition is formatted to a file system ;
simulating the performance of at least one of the desired partition operations to obtain simulation result (particular data, data partition, data group, particular data group) s by manipulating the virtual partition without necessarily modifying on-disk system structures that specify the live partition ;
and providing the simulation results to the user .

US6330653B1
CLAIM 62
. The configured program storage medium of claim 60 , wherein the data structure on the non-volatile storage device includes a file allocation table (particular data, data partition, data group, particular data group) .

US8190610B2
CLAIM 5
. The method of claim 1 , wherein : processing the intermediate data for each data group (simulation result, allocation table) in a manner that is defined to correspond to that data group includes , for each data group , employing an iterator that corresponds to that data group .
US6330653B1
CLAIM 1
. A method for computer partition manipulation , comprising the steps of : obtaining from a user at least one user command , each user command corresponding to at least one desired operation on a virtual partition , each virtual partition corresponding at least initially to a live partition on a computer storage medium , each virtual partition having a specified starting sector address and a specified ending sector address , and also having an indication whether the virtual partition is formatted to a file system ;
simulating the performance of at least one of the desired partition operations to obtain simulation result (particular data, data partition, data group, particular data group) s by manipulating the virtual partition without necessarily modifying on-disk system structures that specify the live partition ;
and providing the simulation results to the user .

US6330653B1
CLAIM 62
. The configured program storage medium of claim 60 , wherein the data structure on the non-volatile storage device includes a file allocation table (particular data, data partition, data group, particular data group) .

US8190610B2
CLAIM 11
. The method of claim 10 , wherein : processing the intermediate data for each data group (simulation result, allocation table) in a manner that is defined to correspond to that data group includes , for each data group , employing an iterator that corresponds to that data group , wherein the iterator includes providing the associated metadata to the processing of the reducing step .
US6330653B1
CLAIM 1
. A method for computer partition manipulation , comprising the steps of : obtaining from a user at least one user command , each user command corresponding to at least one desired operation on a virtual partition , each virtual partition corresponding at least initially to a live partition on a computer storage medium , each virtual partition having a specified starting sector address and a specified ending sector address , and also having an indication whether the virtual partition is formatted to a file system ;
simulating the performance of at least one of the desired partition operations to obtain simulation result (particular data, data partition, data group, particular data group) s by manipulating the virtual partition without necessarily modifying on-disk system structures that specify the live partition ;
and providing the simulation results to the user .

US6330653B1
CLAIM 62
. The configured program storage medium of claim 60 , wherein the data structure on the non-volatile storage device includes a file allocation table (particular data, data partition, data group, particular data group) .

US8190610B2
CLAIM 12
. The method of claim 5 , wherein : the intermediate data includes a plurality of grouped sets of key/value pairs ;

the reducing step is carried out by a plurality of reducers ;

for at least one of the reducers , the iterator corresponding to a particular data (simulation result, allocation table) group , for that reducer , operates according to a different key of a different schema than the iterator corresponding to another particular data group , for that reducer .
US6330653B1
CLAIM 1
. A method for computer partition manipulation , comprising the steps of : obtaining from a user at least one user command , each user command corresponding to at least one desired operation on a virtual partition , each virtual partition corresponding at least initially to a live partition on a computer storage medium , each virtual partition having a specified starting sector address and a specified ending sector address , and also having an indication whether the virtual partition is formatted to a file system ;
simulating the performance of at least one of the desired partition operations to obtain simulation result (particular data, data partition, data group, particular data group) s by manipulating the virtual partition without necessarily modifying on-disk system structures that specify the live partition ;
and providing the simulation results to the user .

US6330653B1
CLAIM 62
. The configured program storage medium of claim 60 , wherein the data structure on the non-volatile storage device includes a file allocation table (particular data, data partition, data group, particular data group) .

US8190610B2
CLAIM 17
. A computer system (computer system) including a plurality of computing devices , the computer system configured to process data of a data set , wherein the data set comprises a plurality of data groups , the computer system comprises at least one processor and memory that are operable to perform the following operations : partitioning the data of each one of the data groups into a plurality of data partitions that each have a plurality of key-value pairs and providing each data partition (simulation result, allocation table) to a selected one of a plurality of mapping functions that are each user-configurable to independently output a plurality of lists of values for each of a set of keys found in such map function' ;

s corresponding data partition to form corresponding intermediate data for that data group (simulation result, allocation table) and identifiable to that data group , wherein the data of a first data group has a different schema than the data of a second data group and the data of the first data group is mapped differently than the data of the second data group so that different lists of values are output for the corresponding different intermediate data , wherein the different schema and corresponding different intermediate data have a key in common ;

and reduce the intermediate data for the data groups to at least one output data group , including processing the intermediate data for each data group in a manner that is defined to correspond to that data group so as to result in a merging of the corresponding different intermediate data based on the key in common .
US6330653B1
CLAIM 1
. A method for computer partition manipulation , comprising the steps of : obtaining from a user at least one user command , each user command corresponding to at least one desired operation on a virtual partition , each virtual partition corresponding at least initially to a live partition on a computer storage medium , each virtual partition having a specified starting sector address and a specified ending sector address , and also having an indication whether the virtual partition is formatted to a file system ;
simulating the performance of at least one of the desired partition operations to obtain simulation result (particular data, data partition, data group, particular data group) s by manipulating the virtual partition without necessarily modifying on-disk system structures that specify the live partition ;
and providing the simulation results to the user .

US6330653B1
CLAIM 50
. The method of claim 46 , wherein the optimized list specifies free space and then c (second data group) reates an operating system partition for installing an operating system .

US6330653B1
CLAIM 57
. A computer program storage medium having a configuration that represents data and instructions which will cause at least a portion of a computer system (computer system) to perform method steps for partition manipulation , the method steps comprising the steps of : obtaining from a user at least one user command , each user command corresponding to at least one desired operation on a virtual partition , each virtual partition corresponding at least initially to a live partition on a non-volatile computer storage device ;
simulating the performance of at least one of the desired partition operations to obtain simulation results by manipulating the virtual partition without necessarily modifying on-device system structures that specify the live partition ;
and providing the simulation results to the user .

US6330653B1
CLAIM 62
. The configured program storage medium of claim 60 , wherein the data structure on the non-volatile storage device includes a file allocation table (particular data, data partition, data group, particular data group) .

US8190610B2
CLAIM 18
. The computer system (computer system) of claim 17 , wherein : the at least one output data group (simulation result, allocation table) is a plurality of output data groups (free space) .
US6330653B1
CLAIM 1
. A method for computer partition manipulation , comprising the steps of : obtaining from a user at least one user command , each user command corresponding to at least one desired operation on a virtual partition , each virtual partition corresponding at least initially to a live partition on a computer storage medium , each virtual partition having a specified starting sector address and a specified ending sector address , and also having an indication whether the virtual partition is formatted to a file system ;
simulating the performance of at least one of the desired partition operations to obtain simulation result (particular data, data partition, data group, particular data group) s by manipulating the virtual partition without necessarily modifying on-disk system structures that specify the live partition ;
and providing the simulation results to the user .

US6330653B1
CLAIM 25
. The method of claim 1 , wherein the simulating step includes simulated partition resizing by expanding a simulated extended partition into free space (output data groups) obtained by manipulating a simulated partition outside the simulated extended partition .

US6330653B1
CLAIM 57
. A computer program storage medium having a configuration that represents data and instructions which will cause at least a portion of a computer system (computer system) to perform method steps for partition manipulation , the method steps comprising the steps of : obtaining from a user at least one user command , each user command corresponding to at least one desired operation on a virtual partition , each virtual partition corresponding at least initially to a live partition on a non-volatile computer storage device ;
simulating the performance of at least one of the desired partition operations to obtain simulation results by manipulating the virtual partition without necessarily modifying on-device system structures that specify the live partition ;
and providing the simulation results to the user .

US6330653B1
CLAIM 62
. The configured program storage medium of claim 60 , wherein the data structure on the non-volatile storage device includes a file allocation table (particular data, data partition, data group, particular data group) .

US8190610B2
CLAIM 19
. The computer system (computer system) of claim 17 , wherein : corresponding intermediate data for a data group (simulation result, allocation table) being identifiable to the data group includes the corresponding intermediate data being stored such that metadata associated with the manner of storage is identifiable to the data group .
US6330653B1
CLAIM 1
. A method for computer partition manipulation , comprising the steps of : obtaining from a user at least one user command , each user command corresponding to at least one desired operation on a virtual partition , each virtual partition corresponding at least initially to a live partition on a computer storage medium , each virtual partition having a specified starting sector address and a specified ending sector address , and also having an indication whether the virtual partition is formatted to a file system ;
simulating the performance of at least one of the desired partition operations to obtain simulation result (particular data, data partition, data group, particular data group) s by manipulating the virtual partition without necessarily modifying on-disk system structures that specify the live partition ;
and providing the simulation results to the user .

US6330653B1
CLAIM 57
. A computer program storage medium having a configuration that represents data and instructions which will cause at least a portion of a computer system (computer system) to perform method steps for partition manipulation , the method steps comprising the steps of : obtaining from a user at least one user command , each user command corresponding to at least one desired operation on a virtual partition , each virtual partition corresponding at least initially to a live partition on a non-volatile computer storage device ;
simulating the performance of at least one of the desired partition operations to obtain simulation results by manipulating the virtual partition without necessarily modifying on-device system structures that specify the live partition ;
and providing the simulation results to the user .

US6330653B1
CLAIM 62
. The configured program storage medium of claim 60 , wherein the data structure on the non-volatile storage device includes a file allocation table (particular data, data partition, data group, particular data group) .

US8190610B2
CLAIM 20
. The computer system (computer system) of claim 19 , wherein : corresponding intermediate data for a data group (simulation result, allocation table) being identifiable to the data group includes the computer system being configured to store the corresponding intermediate data in association with data that is identifiable to the data group .
US6330653B1
CLAIM 1
. A method for computer partition manipulation , comprising the steps of : obtaining from a user at least one user command , each user command corresponding to at least one desired operation on a virtual partition , each virtual partition corresponding at least initially to a live partition on a computer storage medium , each virtual partition having a specified starting sector address and a specified ending sector address , and also having an indication whether the virtual partition is formatted to a file system ;
simulating the performance of at least one of the desired partition operations to obtain simulation result (particular data, data partition, data group, particular data group) s by manipulating the virtual partition without necessarily modifying on-disk system structures that specify the live partition ;
and providing the simulation results to the user .

US6330653B1
CLAIM 57
. A computer program storage medium having a configuration that represents data and instructions which will cause at least a portion of a computer system (computer system) to perform method steps for partition manipulation , the method steps comprising the steps of : obtaining from a user at least one user command , each user command corresponding to at least one desired operation on a virtual partition , each virtual partition corresponding at least initially to a live partition on a non-volatile computer storage device ;
simulating the performance of at least one of the desired partition operations to obtain simulation results by manipulating the virtual partition without necessarily modifying on-device system structures that specify the live partition ;
and providing the simulation results to the user .

US6330653B1
CLAIM 62
. The configured program storage medium of claim 60 , wherein the data structure on the non-volatile storage device includes a file allocation table (particular data, data partition, data group, particular data group) .

US8190610B2
CLAIM 21
. The computer system (computer system) of claim 17 , wherein : the at least one processor and memory being configured to process the intermediate data for each data group (simulation result, allocation table) in a manner that is defined to correspond to that data group includes , for each data group , the at least one processor and memory being configured to employ an iterator that corresponds to that data group .
US6330653B1
CLAIM 1
. A method for computer partition manipulation , comprising the steps of : obtaining from a user at least one user command , each user command corresponding to at least one desired operation on a virtual partition , each virtual partition corresponding at least initially to a live partition on a computer storage medium , each virtual partition having a specified starting sector address and a specified ending sector address , and also having an indication whether the virtual partition is formatted to a file system ;
simulating the performance of at least one of the desired partition operations to obtain simulation result (particular data, data partition, data group, particular data group) s by manipulating the virtual partition without necessarily modifying on-disk system structures that specify the live partition ;
and providing the simulation results to the user .

US6330653B1
CLAIM 57
. A computer program storage medium having a configuration that represents data and instructions which will cause at least a portion of a computer system (computer system) to perform method steps for partition manipulation , the method steps comprising the steps of : obtaining from a user at least one user command , each user command corresponding to at least one desired operation on a virtual partition , each virtual partition corresponding at least initially to a live partition on a non-volatile computer storage device ;
simulating the performance of at least one of the desired partition operations to obtain simulation results by manipulating the virtual partition without necessarily modifying on-device system structures that specify the live partition ;
and providing the simulation results to the user .

US6330653B1
CLAIM 62
. The configured program storage medium of claim 60 , wherein the data structure on the non-volatile storage device includes a file allocation table (particular data, data partition, data group, particular data group) .

US8190610B2
CLAIM 22
. The computer system (computer system) of claim 21 , wherein : the intermediate data includes a plurality of grouped sets of key/value pairs ;

the at least one processor and memory are further operable to reduce the intermediate data via a plurality of reducers ;

for at least one of the reducers , the iterator corresponding to a particular data (simulation result, allocation table) group , for that reducer , is configured to operate according to a different key of a different schema than the iterator corresponding to another particular data group , for that reducer .
US6330653B1
CLAIM 1
. A method for computer partition manipulation , comprising the steps of : obtaining from a user at least one user command , each user command corresponding to at least one desired operation on a virtual partition , each virtual partition corresponding at least initially to a live partition on a computer storage medium , each virtual partition having a specified starting sector address and a specified ending sector address , and also having an indication whether the virtual partition is formatted to a file system ;
simulating the performance of at least one of the desired partition operations to obtain simulation result (particular data, data partition, data group, particular data group) s by manipulating the virtual partition without necessarily modifying on-disk system structures that specify the live partition ;
and providing the simulation results to the user .

US6330653B1
CLAIM 57
. A computer program storage medium having a configuration that represents data and instructions which will cause at least a portion of a computer system (computer system) to perform method steps for partition manipulation , the method steps comprising the steps of : obtaining from a user at least one user command , each user command corresponding to at least one desired operation on a virtual partition , each virtual partition corresponding at least initially to a live partition on a non-volatile computer storage device ;
simulating the performance of at least one of the desired partition operations to obtain simulation results by manipulating the virtual partition without necessarily modifying on-device system structures that specify the live partition ;
and providing the simulation results to the user .

US6330653B1
CLAIM 62
. The configured program storage medium of claim 60 , wherein the data structure on the non-volatile storage device includes a file allocation table (particular data, data partition, data group, particular data group) .

US8190610B2
CLAIM 23
. The computer system (computer system) of claim 17 , wherein : the intermediate data includes a plurality of grouped sets of key/value pairs ;

the at least one processor and memory are further operable to reduce the intermediate data via a plurality of reducers ;

and the at least one processor and memory are further operable to : partition the intermediate data into a plurality of partitions , to provide each key/value pair of the intermediate data to a separate one of the partitions ;

and provide the intermediate data of each partition to a separate one of the reducers .
US6330653B1
CLAIM 57
. A computer program storage medium having a configuration that represents data and instructions which will cause at least a portion of a computer system (computer system) to perform method steps for partition manipulation , the method steps comprising the steps of : obtaining from a user at least one user command , each user command corresponding to at least one desired operation on a virtual partition , each virtual partition corresponding at least initially to a live partition on a non-volatile computer storage device ;
simulating the performance of at least one of the desired partition operations to obtain simulation results by manipulating the virtual partition without necessarily modifying on-device system structures that specify the live partition ;
and providing the simulation results to the user .

US8190610B2
CLAIM 24
. The computer system (computer system) of claim 17 , wherein : the intermediate data includes a plurality of grouped sets of key/value pairs ;

the at least one processor and memory are further operable to reduce the intermediate data via a plurality of reducers ;

and the at least one processor and memory are further operable to : partition the intermediate data into a plurality of partitions , to provide at least some of the key/value pairs of the intermediate data to more than one of the partitions ;

and provide the intermediate data of each partition to a separate one of the reducers .
US6330653B1
CLAIM 57
. A computer program storage medium having a configuration that represents data and instructions which will cause at least a portion of a computer system (computer system) to perform method steps for partition manipulation , the method steps comprising the steps of : obtaining from a user at least one user command , each user command corresponding to at least one desired operation on a virtual partition , each virtual partition corresponding at least initially to a live partition on a non-volatile computer storage device ;
simulating the performance of at least one of the desired partition operations to obtain simulation results by manipulating the virtual partition without necessarily modifying on-device system structures that specify the live partition ;
and providing the simulation results to the user .

US8190610B2
CLAIM 25
. The computer system (computer system) of claim 24 , wherein : the at least one processor and memory are further operable to , in partitioning , to provide all of the key/value pairs of the intermediate data to all of the partitions .
US6330653B1
CLAIM 57
. A computer program storage medium having a configuration that represents data and instructions which will cause at least a portion of a computer system (computer system) to perform method steps for partition manipulation , the method steps comprising the steps of : obtaining from a user at least one user command , each user command corresponding to at least one desired operation on a virtual partition , each virtual partition corresponding at least initially to a live partition on a non-volatile computer storage device ;
simulating the performance of at least one of the desired partition operations to obtain simulation results by manipulating the virtual partition without necessarily modifying on-device system structures that specify the live partition ;
and providing the simulation results to the user .

US8190610B2
CLAIM 26
. The computer system (computer system) of claim 24 , wherein : at least some of the reducers include a sort , group-by-key and combine task ;

the at least one processor and memory are further operable to generate and provide metadata for at least some of the mapping , partitioning , combining , grouping and sorting .
US6330653B1
CLAIM 57
. A computer program storage medium having a configuration that represents data and instructions which will cause at least a portion of a computer system (computer system) to perform method steps for partition manipulation , the method steps comprising the steps of : obtaining from a user at least one user command , each user command corresponding to at least one desired operation on a virtual partition , each virtual partition corresponding at least initially to a live partition on a non-volatile computer storage device ;
simulating the performance of at least one of the desired partition operations to obtain simulation results by manipulating the virtual partition without necessarily modifying on-device system structures that specify the live partition ;
and providing the simulation results to the user .

US8190610B2
CLAIM 27
. The computer system (computer system) of claim 26 , wherein : the reducing includes processing the metadata .
US6330653B1
CLAIM 57
. A computer program storage medium having a configuration that represents data and instructions which will cause at least a portion of a computer system (computer system) to perform method steps for partition manipulation , the method steps comprising the steps of : obtaining from a user at least one user command , each user command corresponding to at least one desired operation on a virtual partition , each virtual partition corresponding at least initially to a live partition on a non-volatile computer storage device ;
simulating the performance of at least one of the desired partition operations to obtain simulation results by manipulating the virtual partition without necessarily modifying on-device system structures that specify the live partition ;
and providing the simulation results to the user .

US8190610B2
CLAIM 28
. The computer system (computer system) of claim 27 , wherein : the at least one processor and memory being further operable to process the intermediate data for each data group (simulation result, allocation table) in a manner that is defined to correspond to that data group includes , for each data group , the at least one processor and memory being further operable to employ an iterator that corresponds to that data group , wherein the iterator is configured to provide the associated metadata to the reducing .
US6330653B1
CLAIM 1
. A method for computer partition manipulation , comprising the steps of : obtaining from a user at least one user command , each user command corresponding to at least one desired operation on a virtual partition , each virtual partition corresponding at least initially to a live partition on a computer storage medium , each virtual partition having a specified starting sector address and a specified ending sector address , and also having an indication whether the virtual partition is formatted to a file system ;
simulating the performance of at least one of the desired partition operations to obtain simulation result (particular data, data partition, data group, particular data group) s by manipulating the virtual partition without necessarily modifying on-disk system structures that specify the live partition ;
and providing the simulation results to the user .

US6330653B1
CLAIM 57
. A computer program storage medium having a configuration that represents data and instructions which will cause at least a portion of a computer system (computer system) to perform method steps for partition manipulation , the method steps comprising the steps of : obtaining from a user at least one user command , each user command corresponding to at least one desired operation on a virtual partition , each virtual partition corresponding at least initially to a live partition on a non-volatile computer storage device ;
simulating the performance of at least one of the desired partition operations to obtain simulation results by manipulating the virtual partition without necessarily modifying on-device system structures that specify the live partition ;
and providing the simulation results to the user .

US6330653B1
CLAIM 62
. The configured program storage medium of claim 60 , wherein the data structure on the non-volatile storage device includes a file allocation table (particular data, data partition, data group, particular data group) .

US8190610B2
CLAIM 29
. The computer system (computer system) of claim 17 , wherein : the intermediate data processing of the reducing further comprises processing data that is not intermediate data .
US6330653B1
CLAIM 57
. A computer program storage medium having a configuration that represents data and instructions which will cause at least a portion of a computer system (computer system) to perform method steps for partition manipulation , the method steps comprising the steps of : obtaining from a user at least one user command , each user command corresponding to at least one desired operation on a virtual partition , each virtual partition corresponding at least initially to a live partition on a non-volatile computer storage device ;
simulating the performance of at least one of the desired partition operations to obtain simulation results by manipulating the virtual partition without necessarily modifying on-device system structures that specify the live partition ;
and providing the simulation results to the user .

US8190610B2
CLAIM 30
. The computer system (computer system) of claim 29 , wherein : the at least one processor and memory are further operable to reduce the intermediate data via a plurality of reducers ;

and the data that is not intermediate data , for a particular reducer , includes data that is associated with another reducer .
US6330653B1
CLAIM 57
. A computer program storage medium having a configuration that represents data and instructions which will cause at least a portion of a computer system (computer system) to perform method steps for partition manipulation , the method steps comprising the steps of : obtaining from a user at least one user command , each user command corresponding to at least one desired operation on a virtual partition , each virtual partition corresponding at least initially to a live partition on a non-volatile computer storage device ;
simulating the performance of at least one of the desired partition operations to obtain simulation results by manipulating the virtual partition without necessarily modifying on-device system structures that specify the live partition ;
and providing the simulation results to the user .

US8190610B2
CLAIM 31
. The computer system (computer system) of claim 29 , wherein : the at least one processor and memory are further operable to reduce the intermediate data via a plurality of reducers ;

and the data that is not intermediate data , for a particular reducer , includes data that is associated with that reducer .
US6330653B1
CLAIM 57
. A computer program storage medium having a configuration that represents data and instructions which will cause at least a portion of a computer system (computer system) to perform method steps for partition manipulation , the method steps comprising the steps of : obtaining from a user at least one user command , each user command corresponding to at least one desired operation on a virtual partition , each virtual partition corresponding at least initially to a live partition on a non-volatile computer storage device ;
simulating the performance of at least one of the desired partition operations to obtain simulation results by manipulating the virtual partition without necessarily modifying on-device system structures that specify the live partition ;
and providing the simulation results to the user .

US8190610B2
CLAIM 32
. The computer system (computer system) of claim 17 , wherein the at least one processor and memory are further operable to reduce by relating the data among the plurality of data groups .
US6330653B1
CLAIM 57
. A computer program storage medium having a configuration that represents data and instructions which will cause at least a portion of a computer system (computer system) to perform method steps for partition manipulation , the method steps comprising the steps of : obtaining from a user at least one user command , each user command corresponding to at least one desired operation on a virtual partition , each virtual partition corresponding at least initially to a live partition on a non-volatile computer storage device ;
simulating the performance of at least one of the desired partition operations to obtain simulation results by manipulating the virtual partition without necessarily modifying on-device system structures that specify the live partition ;
and providing the simulation results to the user .

US8190610B2
CLAIM 33
. A map-reduce method of processing data from a plurality of groups having different schema over a computer system (computer system) , the method comprising : for a first data set having a plurality of first key-value pairs , wherein such first data set belongs to a first data group (simulation result, allocation table) and the first key-value pairs have a first schema , partitioning the first data set into a plurality of data partitions and providing each data partition (simulation result, allocation table) to a selected one of a plurality of mapping functions that are each user-configurable to independently output a plurality of lists of values for each of a set of keys found in such map function' ;

s corresponding data partition to form a first intermediate data (same computer) set having a first set (first set) of resulting key-value pairs ;

for a second data set having a plurality of second key-value pairs , wherein such second data set belongs to a second data group and the second key-value pairs have a second schema , partitioning the first data set into a plurality of data partitions and providing each data partition to a selected one of a plurality of mapping functions to form a second intermediate data set having a second set of resulting key-value pairs differing from the first set of resulting key-value pairs , wherein the first schema differs from the second schema , wherein the first and second schema and the first and second set of resulting key-value pairs have a key in common ;

and reducing the first and the second intermediate data set together so as to form an output data set , wherein the reducing is accomplished by iterating on at least one key from the first and second intermediate data set , and the output data set has a different schema than the first and second schema , wherein the mapping and reducing operations are performed by a distributed system .
US6330653B1
CLAIM 1
. A method for computer partition manipulation , comprising the steps of : obtaining from a user at least one user command , each user command corresponding to at least one desired operation on a virtual partition , each virtual partition corresponding at least initially to a live partition on a computer storage medium , each virtual partition having a specified starting sector address and a specified ending sector address , and also having an indication whether the virtual partition is formatted to a file system ;
simulating the performance of at least one of the desired partition operations to obtain simulation result (particular data, data partition, data group, particular data group) s by manipulating the virtual partition without necessarily modifying on-disk system structures that specify the live partition ;
and providing the simulation results to the user .

US6330653B1
CLAIM 44
. The method of claim 43 , wherein the optimization removes a first set (first set) of one or more partition operations that is made redundant by a second , later set of one or more partition operations .

US6330653B1
CLAIM 50
. The method of claim 46 , wherein the optimized list specifies free space and then c (second data group) reates an operating system partition for installing an operating system .

US6330653B1
CLAIM 57
. A computer program storage medium having a configuration that represents data and instructions which will cause at least a portion of a computer system (computer system) to perform method steps for partition manipulation , the method steps comprising the steps of : obtaining from a user at least one user command , each user command corresponding to at least one desired operation on a virtual partition , each virtual partition corresponding at least initially to a live partition on a non-volatile computer storage device ;
simulating the performance of at least one of the desired partition operations to obtain simulation results by manipulating the virtual partition without necessarily modifying on-device system structures that specify the live partition ;
and providing the simulation results to the user .

US6330653B1
CLAIM 62
. The configured program storage medium of claim 60 , wherein the data structure on the non-volatile storage device includes a file allocation table (particular data, data partition, data group, particular data group) .

US6330653B1
CLAIM 84
. The system of claim 83 , wherein the storage medium is directly attached to the same computer (first intermediate data, first intermediate data set) as the virtual engine data structure .

US8190610B2
CLAIM 40
. A computer system (computer system) including a plurality of computing devices , the computer system configured to process data of a data set , wherein the data set comprises a plurality of data groups , the computer system comprises at least one processor and memory that are operable to perform the following operations : for a first data set having a plurality of first key-value pairs , wherein such first data set belongs to a first data group (simulation result, allocation table) and the first key-value pairs have a first schema , partitioning the first data set into a plurality of data partitions and providing each data partition (simulation result, allocation table) to a selected one of a plurality of mapping functions that are each user-configurable to independently output a plurality of lists of values for each of a set of keys found in such map function' ;

s corresponding data partition to form a first intermediate data (same computer) set having a first set (first set) of resulting key-value pairs ;

for a second data set having a plurality of second key-value pairs , wherein such second data set belongs to a second data group and the second key-value pairs have a second schema , partitioning the first data set into a plurality of data partitions and providing each data partition to a selected one of a plurality of mapping functions to form a second intermediate data set having a second set of resulting key-value pairs differing from the first set of resulting key-value pairs , wherein the first schema differs from the second schema ;

and reducing the first and the second intermediate data set together so as to form an output data set , wherein the reducing is accomplished by iterating on at least one key from the first and second intermediate data set , and the output data set has a different schema than the first and second schema , wherein the first and second schema and the first and second set of resulting key-value pairs have a key in common .
US6330653B1
CLAIM 1
. A method for computer partition manipulation , comprising the steps of : obtaining from a user at least one user command , each user command corresponding to at least one desired operation on a virtual partition , each virtual partition corresponding at least initially to a live partition on a computer storage medium , each virtual partition having a specified starting sector address and a specified ending sector address , and also having an indication whether the virtual partition is formatted to a file system ;
simulating the performance of at least one of the desired partition operations to obtain simulation result (particular data, data partition, data group, particular data group) s by manipulating the virtual partition without necessarily modifying on-disk system structures that specify the live partition ;
and providing the simulation results to the user .

US6330653B1
CLAIM 44
. The method of claim 43 , wherein the optimization removes a first set (first set) of one or more partition operations that is made redundant by a second , later set of one or more partition operations .

US6330653B1
CLAIM 50
. The method of claim 46 , wherein the optimized list specifies free space and then c (second data group) reates an operating system partition for installing an operating system .

US6330653B1
CLAIM 57
. A computer program storage medium having a configuration that represents data and instructions which will cause at least a portion of a computer system (computer system) to perform method steps for partition manipulation , the method steps comprising the steps of : obtaining from a user at least one user command , each user command corresponding to at least one desired operation on a virtual partition , each virtual partition corresponding at least initially to a live partition on a non-volatile computer storage device ;
simulating the performance of at least one of the desired partition operations to obtain simulation results by manipulating the virtual partition without necessarily modifying on-device system structures that specify the live partition ;
and providing the simulation results to the user .

US6330653B1
CLAIM 62
. The configured program storage medium of claim 60 , wherein the data structure on the non-volatile storage device includes a file allocation table (particular data, data partition, data group, particular data group) .

US6330653B1
CLAIM 84
. The system of claim 83 , wherein the storage medium is directly attached to the same computer (first intermediate data, first intermediate data set) as the virtual engine data structure .

US8190610B2
CLAIM 41
. The computer system (computer system) of claim 40 , wherein the reducing is accomplished by iterating on at least one key from each of the first and second intermediate data set so that the output data set is a merging of a portion of the first and second intermediate data set .
US6330653B1
CLAIM 57
. A computer program storage medium having a configuration that represents data and instructions which will cause at least a portion of a computer system (computer system) to perform method steps for partition manipulation , the method steps comprising the steps of : obtaining from a user at least one user command , each user command corresponding to at least one desired operation on a virtual partition , each virtual partition corresponding at least initially to a live partition on a non-volatile computer storage device ;
simulating the performance of at least one of the desired partition operations to obtain simulation results by manipulating the virtual partition without necessarily modifying on-device system structures that specify the live partition ;
and providing the simulation results to the user .

US8190610B2
CLAIM 42
. The computer system (computer system) of claim 40 , wherein the reducing is accomplished by a plurality of reducers , the d the at least one processor and memory further operable for : partitioning the first or second intermediate data set into a plurality of partitions , at least some of the key-value pairs of the first or second intermediate data being provided to more than one of the partitions ;

and providing the partitioned intermediate data set of each partition to a separate one of the reducers .
US6330653B1
CLAIM 57
. A computer program storage medium having a configuration that represents data and instructions which will cause at least a portion of a computer system (computer system) to perform method steps for partition manipulation , the method steps comprising the steps of : obtaining from a user at least one user command , each user command corresponding to at least one desired operation on a virtual partition , each virtual partition corresponding at least initially to a live partition on a non-volatile computer storage device ;
simulating the performance of at least one of the desired partition operations to obtain simulation results by manipulating the virtual partition without necessarily modifying on-device system structures that specify the live partition ;
and providing the simulation results to the user .

US8190610B2
CLAIM 43
. The computer system (computer system) of claim 42 , wherein all of the key-value pairs of the first and second intermediate data set are provided to all of the reducers .
US6330653B1
CLAIM 57
. A computer program storage medium having a configuration that represents data and instructions which will cause at least a portion of a computer system (computer system) to perform method steps for partition manipulation , the method steps comprising the steps of : obtaining from a user at least one user command , each user command corresponding to at least one desired operation on a virtual partition , each virtual partition corresponding at least initially to a live partition on a non-volatile computer storage device ;
simulating the performance of at least one of the desired partition operations to obtain simulation results by manipulating the virtual partition without necessarily modifying on-device system structures that specify the live partition ;
and providing the simulation results to the user .

US8190610B2
CLAIM 44
. The computer system (computer system) of claim 42 , wherein at least some of the reducers include a sort , group-by-key and combine task , the d the at least one processor and memory are further operable for : generating and providing metadata for at least some of the mapping , partitioning , combining , grouping and sorting .
US6330653B1
CLAIM 57
. A computer program storage medium having a configuration that represents data and instructions which will cause at least a portion of a computer system (computer system) to perform method steps for partition manipulation , the method steps comprising the steps of : obtaining from a user at least one user command , each user command corresponding to at least one desired operation on a virtual partition , each virtual partition corresponding at least initially to a live partition on a non-volatile computer storage device ;
simulating the performance of at least one of the desired partition operations to obtain simulation results by manipulating the virtual partition without necessarily modifying on-device system structures that specify the live partition ;
and providing the simulation results to the user .

US8190610B2
CLAIM 45
. The computer system (computer system) of claim 44 , wherein the reducing includes processing the metadata .
US6330653B1
CLAIM 57
. A computer program storage medium having a configuration that represents data and instructions which will cause at least a portion of a computer system (computer system) to perform method steps for partition manipulation , the method steps comprising the steps of : obtaining from a user at least one user command , each user command corresponding to at least one desired operation on a virtual partition , each virtual partition corresponding at least initially to a live partition on a non-volatile computer storage device ;
simulating the performance of at least one of the desired partition operations to obtain simulation results by manipulating the virtual partition without necessarily modifying on-device system structures that specify the live partition ;
and providing the simulation results to the user .

US8190610B2
CLAIM 46
. The computer system (computer system) of claim 45 , wherein iterating includes providing the associated metadata to the processing of the reducing step .
US6330653B1
CLAIM 57
. A computer program storage medium having a configuration that represents data and instructions which will cause at least a portion of a computer system (computer system) to perform method steps for partition manipulation , the method steps comprising the steps of : obtaining from a user at least one user command , each user command corresponding to at least one desired operation on a virtual partition , each virtual partition corresponding at least initially to a live partition on a non-volatile computer storage device ;
simulating the performance of at least one of the desired partition operations to obtain simulation results by manipulating the virtual partition without necessarily modifying on-device system structures that specify the live partition ;
and providing the simulation results to the user .




US8190610B2

Filed: 2006-10-05     Issued: 2012-05-29

MapReduce for distributed database processing

(Original Assignee) Yahoo Inc     (Current Assignee) R2 Solutions LLC ; Altaba Inc

Ali Dasdan, Hung-Chih Yang, Ruey-Lung Hsiao
JPH11327982A

Filed: 1999-04-07     Issued: 1999-11-30

分散デ―タベ―スシステム障害回復方法

(Original Assignee) Lucent Technol Inc; ルーセント テクノロジーズ インコーポレイテッド     

Mark Lawrence Blood, Stephen Dexter Coomer, David Dayton Nason, Mohamad-Reza Yamini, デクスター クーマー スティーブン, デイトン ネイソン デビット, ローレンス ブラッド マーク, ヤミニ モハメド−レザ
US8190610B2
CLAIM 1
. A method of processing data (コンピュ) of a data set (有する第1, アップ) over a distributed system , wherein the data set comprises a plurality of data groups , the method comprising : partitioning the data of each one of the data groups into a plurality of data partitions that each have a plurality of key-value pairs and providing each data partition to a selected one of a plurality of mapping functions that are each user-configurable to independently output a plurality of lists of values for each of a set of keys found in such map function' ;

s corresponding data partition to form corresponding intermediate data for that data group and identifiable to that data group , wherein the data of a first data group has a different schema (メモリ) than the data of a second data group and the data of the first data group is mapped differently than the data of the second data group so that different lists of values are output for the corresponding different intermediate data , wherein the different schema and corresponding different intermediate data have a key in common ;

and reducing the intermediate data for the data groups to at least one output data group , including processing the intermediate data for each data group in a manner that is defined to correspond to that data group , so as to result in a merging of the corresponding different intermediate data based on the key in common , wherein the mapping and reducing operations are performed by a distributed system .
JPH11327982A
CLAIM 14
【請求項14】 e.あるプロセッサをバックアップ (second set, first set, data set, first data set, second data set, second intermediate data set, output data set, intermediate data set) マ スタプロセッサに指定するステップと、 f.前記バックアップマスタプロセッサに関連づけられ たジャーナルを作成するステップと、 g.前記バックアップマスタプロセッサに関連づけられ たジャーナルに、前記マスタプロセッサに関連づけられ たジャーナルに記録されている情報を記録するステップ とをさらに有することを特徴とする請求項6に記載の方 法。

JPH11327982A
CLAIM 34
【請求項34】 少なくとも1つのデータベース記録は ランダムアクセスメモリ (different schema) からなることを特徴とする請求 項33に記載の分散プロセッサネットワーク。

JPH11327982A
CLAIM 38
【請求項38】 前記記憶手段は、コンピュ (processing data) ータが読み 出し可能な媒体であり、該媒体には、該媒体のメモリア ドレスの第1の領域に記憶された第1のデータ記憶要素 を含むデータ構造が記憶されることを特徴とする請求項 37に記載の分散プロセッサネットワーク。

JPH11327982A
CLAIM 39
【請求項39】 各プロセッサがデータベースに関連づ けられた分散プロセッサネットワークにおいて、 各プロセッサに関連づけられたデータベースを更新する マスタプロセッサと、 前記マスタプロセッサによって送信されたデータベース 更新を記録する複数のスレーブプロセッサと、 各スレーブプロセッサに関連づけられた、第1タイマウ ィンドウを有する第1 (second set, first set, data set, first data set, second data set, second intermediate data set, output data set, intermediate data set) タイマと、 各スレーブプロセッサに関連づけられた、第2タイマウ ィンドウを有する第2タイマと、 前記マスタプロセッサによって生成されたデータベース 更新トランザクションのステップを記録するために、前 記マスタプロセッサに関連づけられたジャーナルと、 前記スレーブプロセッサによって実行された最後に完了 したデータベース更新トランザクションの指標を記憶す るメモリ手段とからなることを特徴とする分散プロセッ サネットワーク。

US8190610B2
CLAIM 12
. The method of claim 5 , wherein : the intermediate data includes a plurality of grouped sets of key/value pairs ;

the reducing step is carried out by a plurality of reducers ;

for at least one of the reducers , the iterator corresponding to a particular data group , for that reducer , operates according to a different key of a different schema (メモリ) than the iterator corresponding to another particular data group , for that reducer .
JPH11327982A
CLAIM 34
【請求項34】 少なくとも1つのデータベース記録は ランダムアクセスメモリ (different schema) からなることを特徴とする請求 項33に記載の分散プロセッサネットワーク。

US8190610B2
CLAIM 13
. The method of claim 1 , wherein : the intermediate data processing step of the reducing step further comprises processing data (コンピュ) that is not intermediate data .
JPH11327982A
CLAIM 38
【請求項38】 前記記憶手段は、コンピュ (processing data) ータが読み 出し可能な媒体であり、該媒体には、該媒体のメモリア ドレスの第1の領域に記憶された第1のデータ記憶要素 を含むデータ構造が記憶されることを特徴とする請求項 37に記載の分散プロセッサネットワーク。

US8190610B2
CLAIM 17
. A computer system including a plurality of computing devices , the computer system configured to process data of a data set (有する第1, アップ) , wherein the data set comprises a plurality of data groups , the computer system comprises at least one processor and memory that are operable to perform the following operations : partitioning the data of each one of the data groups into a plurality of data partitions that each have a plurality of key-value pairs and providing each data partition to a selected one of a plurality of mapping functions that are each user-configurable to independently output a plurality of lists of values for each of a set of keys found in such map function' ;

s corresponding data partition to form corresponding intermediate data for that data group and identifiable to that data group , wherein the data of a first data group has a different schema (メモリ) than the data of a second data group and the data of the first data group is mapped differently than the data of the second data group so that different lists of values are output for the corresponding different intermediate data , wherein the different schema and corresponding different intermediate data have a key in common ;

and reduce the intermediate data for the data groups to at least one output data group , including processing the intermediate data for each data group in a manner that is defined to correspond to that data group so as to result in a merging of the corresponding different intermediate data based on the key in common .
JPH11327982A
CLAIM 14
【請求項14】 e.あるプロセッサをバックアップ (second set, first set, data set, first data set, second data set, second intermediate data set, output data set, intermediate data set) マ スタプロセッサに指定するステップと、 f.前記バックアップマスタプロセッサに関連づけられ たジャーナルを作成するステップと、 g.前記バックアップマスタプロセッサに関連づけられ たジャーナルに、前記マスタプロセッサに関連づけられ たジャーナルに記録されている情報を記録するステップ とをさらに有することを特徴とする請求項6に記載の方 法。

JPH11327982A
CLAIM 34
【請求項34】 少なくとも1つのデータベース記録は ランダムアクセスメモリ (different schema) からなることを特徴とする請求 項33に記載の分散プロセッサネットワーク。

JPH11327982A
CLAIM 39
【請求項39】 各プロセッサがデータベースに関連づ けられた分散プロセッサネットワークにおいて、 各プロセッサに関連づけられたデータベースを更新する マスタプロセッサと、 前記マスタプロセッサによって送信されたデータベース 更新を記録する複数のスレーブプロセッサと、 各スレーブプロセッサに関連づけられた、第1タイマウ ィンドウを有する第1 (second set, first set, data set, first data set, second data set, second intermediate data set, output data set, intermediate data set) タイマと、 各スレーブプロセッサに関連づけられた、第2タイマウ ィンドウを有する第2タイマと、 前記マスタプロセッサによって生成されたデータベース 更新トランザクションのステップを記録するために、前 記マスタプロセッサに関連づけられたジャーナルと、 前記スレーブプロセッサによって実行された最後に完了 したデータベース更新トランザクションの指標を記憶す るメモリ手段とからなることを特徴とする分散プロセッ サネットワーク。

US8190610B2
CLAIM 22
. The computer system of claim 21 , wherein : the intermediate data includes a plurality of grouped sets of key/value pairs ;

the at least one processor and memory are further operable to reduce the intermediate data via a plurality of reducers ;

for at least one of the reducers , the iterator corresponding to a particular data group , for that reducer , is configured to operate according to a different key of a different schema (メモリ) than the iterator corresponding to another particular data group , for that reducer .
JPH11327982A
CLAIM 34
【請求項34】 少なくとも1つのデータベース記録は ランダムアクセスメモリ (different schema) からなることを特徴とする請求 項33に記載の分散プロセッサネットワーク。

US8190610B2
CLAIM 29
. The computer system of claim 17 , wherein : the intermediate data processing of the reducing further comprises processing data (コンピュ) that is not intermediate data .
JPH11327982A
CLAIM 38
【請求項38】 前記記憶手段は、コンピュ (processing data) ータが読み 出し可能な媒体であり、該媒体には、該媒体のメモリア ドレスの第1の領域に記憶された第1のデータ記憶要素 を含むデータ構造が記憶されることを特徴とする請求項 37に記載の分散プロセッサネットワーク。

US8190610B2
CLAIM 33
. A map-reduce method of processing data (コンピュ) from a plurality of groups having different schema (メモリ) over a computer system , the method comprising : for a first data set (有する第1, アップ) having a plurality of first key-value pairs , wherein such first data set belongs to a first data group and the first key-value pairs have a first schema , partitioning the first data set into a plurality of data partitions and providing each data partition to a selected one of a plurality of mapping functions that are each user-configurable to independently output a plurality of lists of values for each of a set of keys found in such map function' ;

s corresponding data partition to form a first intermediate data set having a first set (有する第1, アップ) of resulting key-value pairs ;

for a second data set having a plurality of second key-value pairs , wherein such second data set belongs to a second data group and the second key-value pairs have a second schema , partitioning the first data set into a plurality of data partitions and providing each data partition to a selected one of a plurality of mapping functions to form a second intermediate data set having a second set (有する第1, アップ) of resulting key-value pairs differing from the first set of resulting key-value pairs , wherein the first schema differs from the second schema , wherein the first and second schema and the first and second set of resulting key-value pairs have a key in common ;

and reducing the first and the second intermediate data set together so as to form an output data set , wherein the reducing is accomplished by iterating on at least one key from the first and second intermediate data set , and the output data set has a different schema than the first and second schema , wherein the mapping and reducing operations are performed by a distributed system .
JPH11327982A
CLAIM 14
【請求項14】 e.あるプロセッサをバックアップ (second set, first set, data set, first data set, second data set, second intermediate data set, output data set, intermediate data set) マ スタプロセッサに指定するステップと、 f.前記バックアップマスタプロセッサに関連づけられ たジャーナルを作成するステップと、 g.前記バックアップマスタプロセッサに関連づけられ たジャーナルに、前記マスタプロセッサに関連づけられ たジャーナルに記録されている情報を記録するステップ とをさらに有することを特徴とする請求項6に記載の方 法。

JPH11327982A
CLAIM 34
【請求項34】 少なくとも1つのデータベース記録は ランダムアクセスメモリ (different schema) からなることを特徴とする請求 項33に記載の分散プロセッサネットワーク。

JPH11327982A
CLAIM 38
【請求項38】 前記記憶手段は、コンピュ (processing data) ータが読み 出し可能な媒体であり、該媒体には、該媒体のメモリア ドレスの第1の領域に記憶された第1のデータ記憶要素 を含むデータ構造が記憶されることを特徴とする請求項 37に記載の分散プロセッサネットワーク。

JPH11327982A
CLAIM 39
【請求項39】 各プロセッサがデータベースに関連づ けられた分散プロセッサネットワークにおいて、 各プロセッサに関連づけられたデータベースを更新する マスタプロセッサと、 前記マスタプロセッサによって送信されたデータベース 更新を記録する複数のスレーブプロセッサと、 各スレーブプロセッサに関連づけられた、第1タイマウ ィンドウを有する第1 (second set, first set, data set, first data set, second data set, second intermediate data set, output data set, intermediate data set) タイマと、 各スレーブプロセッサに関連づけられた、第2タイマウ ィンドウを有する第2タイマと、 前記マスタプロセッサによって生成されたデータベース 更新トランザクションのステップを記録するために、前 記マスタプロセッサに関連づけられたジャーナルと、 前記スレーブプロセッサによって実行された最後に完了 したデータベース更新トランザクションの指標を記憶す るメモリ手段とからなることを特徴とする分散プロセッ サネットワーク。

US8190610B2
CLAIM 34
. The map-reduce method of claim 33 , wherein the reducing is accomplished by iterating on at least one key from each of the first and second intermediate data set (有する第1, アップ) so that the output data set is a merging of a portion of the first and second intermediate data set .
JPH11327982A
CLAIM 14
【請求項14】 e.あるプロセッサをバックアップ (second set, first set, data set, first data set, second data set, second intermediate data set, output data set, intermediate data set) マ スタプロセッサに指定するステップと、 f.前記バックアップマスタプロセッサに関連づけられ たジャーナルを作成するステップと、 g.前記バックアップマスタプロセッサに関連づけられ たジャーナルに、前記マスタプロセッサに関連づけられ たジャーナルに記録されている情報を記録するステップ とをさらに有することを特徴とする請求項6に記載の方 法。

JPH11327982A
CLAIM 39
【請求項39】 各プロセッサがデータベースに関連づ けられた分散プロセッサネットワークにおいて、 各プロセッサに関連づけられたデータベースを更新する マスタプロセッサと、 前記マスタプロセッサによって送信されたデータベース 更新を記録する複数のスレーブプロセッサと、 各スレーブプロセッサに関連づけられた、第1タイマウ ィンドウを有する第1 (second set, first set, data set, first data set, second data set, second intermediate data set, output data set, intermediate data set) タイマと、 各スレーブプロセッサに関連づけられた、第2タイマウ ィンドウを有する第2タイマと、 前記マスタプロセッサによって生成されたデータベース 更新トランザクションのステップを記録するために、前 記マスタプロセッサに関連づけられたジャーナルと、 前記スレーブプロセッサによって実行された最後に完了 したデータベース更新トランザクションの指標を記憶す るメモリ手段とからなることを特徴とする分散プロセッ サネットワーク。

US8190610B2
CLAIM 35
. The map-reduce method of claim 33 , wherein the reducing is accomplished by a plurality of reducers , the method further comprises : partitioning the first or second intermediate data set (有する第1, アップ) into a plurality of partitions , at least some of the key-value pairs of the first or second intermediate data being provided to more than one of the partitions ;

and providing the partitioned intermediate data set of each partition to a separate one of the reducers .
JPH11327982A
CLAIM 14
【請求項14】 e.あるプロセッサをバックアップ (second set, first set, data set, first data set, second data set, second intermediate data set, output data set, intermediate data set) マ スタプロセッサに指定するステップと、 f.前記バックアップマスタプロセッサに関連づけられ たジャーナルを作成するステップと、 g.前記バックアップマスタプロセッサに関連づけられ たジャーナルに、前記マスタプロセッサに関連づけられ たジャーナルに記録されている情報を記録するステップ とをさらに有することを特徴とする請求項6に記載の方 法。

JPH11327982A
CLAIM 39
【請求項39】 各プロセッサがデータベースに関連づ けられた分散プロセッサネットワークにおいて、 各プロセッサに関連づけられたデータベースを更新する マスタプロセッサと、 前記マスタプロセッサによって送信されたデータベース 更新を記録する複数のスレーブプロセッサと、 各スレーブプロセッサに関連づけられた、第1タイマウ ィンドウを有する第1 (second set, first set, data set, first data set, second data set, second intermediate data set, output data set, intermediate data set) タイマと、 各スレーブプロセッサに関連づけられた、第2タイマウ ィンドウを有する第2タイマと、 前記マスタプロセッサによって生成されたデータベース 更新トランザクションのステップを記録するために、前 記マスタプロセッサに関連づけられたジャーナルと、 前記スレーブプロセッサによって実行された最後に完了 したデータベース更新トランザクションの指標を記憶す るメモリ手段とからなることを特徴とする分散プロセッ サネットワーク。

US8190610B2
CLAIM 36
. The map-reduce method of claim 35 , wherein all of the key-value pairs of the first and second intermediate data set (有する第1, アップ) are provided to all of the reducers .
JPH11327982A
CLAIM 14
【請求項14】 e.あるプロセッサをバックアップ (second set, first set, data set, first data set, second data set, second intermediate data set, output data set, intermediate data set) マ スタプロセッサに指定するステップと、 f.前記バックアップマスタプロセッサに関連づけられ たジャーナルを作成するステップと、 g.前記バックアップマスタプロセッサに関連づけられ たジャーナルに、前記マスタプロセッサに関連づけられ たジャーナルに記録されている情報を記録するステップ とをさらに有することを特徴とする請求項6に記載の方 法。

JPH11327982A
CLAIM 39
【請求項39】 各プロセッサがデータベースに関連づ けられた分散プロセッサネットワークにおいて、 各プロセッサに関連づけられたデータベースを更新する マスタプロセッサと、 前記マスタプロセッサによって送信されたデータベース 更新を記録する複数のスレーブプロセッサと、 各スレーブプロセッサに関連づけられた、第1タイマウ ィンドウを有する第1 (second set, first set, data set, first data set, second data set, second intermediate data set, output data set, intermediate data set) タイマと、 各スレーブプロセッサに関連づけられた、第2タイマウ ィンドウを有する第2タイマと、 前記マスタプロセッサによって生成されたデータベース 更新トランザクションのステップを記録するために、前 記マスタプロセッサに関連づけられたジャーナルと、 前記スレーブプロセッサによって実行された最後に完了 したデータベース更新トランザクションの指標を記憶す るメモリ手段とからなることを特徴とする分散プロセッ サネットワーク。

US8190610B2
CLAIM 40
. A computer system including a plurality of computing devices , the computer system configured to process data of a data set (有する第1, アップ) , wherein the data set comprises a plurality of data groups , the computer system comprises at least one processor and memory that are operable to perform the following operations : for a first data set having a plurality of first key-value pairs , wherein such first data set belongs to a first data group and the first key-value pairs have a first schema , partitioning the first data set into a plurality of data partitions and providing each data partition to a selected one of a plurality of mapping functions that are each user-configurable to independently output a plurality of lists of values for each of a set of keys found in such map function' ;

s corresponding data partition to form a first intermediate data set having a first set (有する第1, アップ) of resulting key-value pairs ;

for a second data set having a plurality of second key-value pairs , wherein such second data set belongs to a second data group and the second key-value pairs have a second schema , partitioning the first data set into a plurality of data partitions and providing each data partition to a selected one of a plurality of mapping functions to form a second intermediate data set having a second set (有する第1, アップ) of resulting key-value pairs differing from the first set of resulting key-value pairs , wherein the first schema differs from the second schema ;

and reducing the first and the second intermediate data set together so as to form an output data set , wherein the reducing is accomplished by iterating on at least one key from the first and second intermediate data set , and the output data set has a different schema (メモリ) than the first and second schema , wherein the first and second schema and the first and second set of resulting key-value pairs have a key in common .
JPH11327982A
CLAIM 14
【請求項14】 e.あるプロセッサをバックアップ (second set, first set, data set, first data set, second data set, second intermediate data set, output data set, intermediate data set) マ スタプロセッサに指定するステップと、 f.前記バックアップマスタプロセッサに関連づけられ たジャーナルを作成するステップと、 g.前記バックアップマスタプロセッサに関連づけられ たジャーナルに、前記マスタプロセッサに関連づけられ たジャーナルに記録されている情報を記録するステップ とをさらに有することを特徴とする請求項6に記載の方 法。

JPH11327982A
CLAIM 34
【請求項34】 少なくとも1つのデータベース記録は ランダムアクセスメモリ (different schema) からなることを特徴とする請求 項33に記載の分散プロセッサネットワーク。

JPH11327982A
CLAIM 39
【請求項39】 各プロセッサがデータベースに関連づ けられた分散プロセッサネットワークにおいて、 各プロセッサに関連づけられたデータベースを更新する マスタプロセッサと、 前記マスタプロセッサによって送信されたデータベース 更新を記録する複数のスレーブプロセッサと、 各スレーブプロセッサに関連づけられた、第1タイマウ ィンドウを有する第1 (second set, first set, data set, first data set, second data set, second intermediate data set, output data set, intermediate data set) タイマと、 各スレーブプロセッサに関連づけられた、第2タイマウ ィンドウを有する第2タイマと、 前記マスタプロセッサによって生成されたデータベース 更新トランザクションのステップを記録するために、前 記マスタプロセッサに関連づけられたジャーナルと、 前記スレーブプロセッサによって実行された最後に完了 したデータベース更新トランザクションの指標を記憶す るメモリ手段とからなることを特徴とする分散プロセッ サネットワーク。

US8190610B2
CLAIM 41
. The computer system of claim 40 , wherein the reducing is accomplished by iterating on at least one key from each of the first and second intermediate data set (有する第1, アップ) so that the output data set is a merging of a portion of the first and second intermediate data set .
JPH11327982A
CLAIM 14
【請求項14】 e.あるプロセッサをバックアップ (second set, first set, data set, first data set, second data set, second intermediate data set, output data set, intermediate data set) マ スタプロセッサに指定するステップと、 f.前記バックアップマスタプロセッサに関連づけられ たジャーナルを作成するステップと、 g.前記バックアップマスタプロセッサに関連づけられ たジャーナルに、前記マスタプロセッサに関連づけられ たジャーナルに記録されている情報を記録するステップ とをさらに有することを特徴とする請求項6に記載の方 法。

JPH11327982A
CLAIM 39
【請求項39】 各プロセッサがデータベースに関連づ けられた分散プロセッサネットワークにおいて、 各プロセッサに関連づけられたデータベースを更新する マスタプロセッサと、 前記マスタプロセッサによって送信されたデータベース 更新を記録する複数のスレーブプロセッサと、 各スレーブプロセッサに関連づけられた、第1タイマウ ィンドウを有する第1 (second set, first set, data set, first data set, second data set, second intermediate data set, output data set, intermediate data set) タイマと、 各スレーブプロセッサに関連づけられた、第2タイマウ ィンドウを有する第2タイマと、 前記マスタプロセッサによって生成されたデータベース 更新トランザクションのステップを記録するために、前 記マスタプロセッサに関連づけられたジャーナルと、 前記スレーブプロセッサによって実行された最後に完了 したデータベース更新トランザクションの指標を記憶す るメモリ手段とからなることを特徴とする分散プロセッ サネットワーク。

US8190610B2
CLAIM 42
. The computer system of claim 40 , wherein the reducing is accomplished by a plurality of reducers , the d the at least one processor and memory further operable for : partitioning the first or second intermediate data set (有する第1, アップ) into a plurality of partitions , at least some of the key-value pairs of the first or second intermediate data being provided to more than one of the partitions ;

and providing the partitioned intermediate data set of each partition to a separate one of the reducers .
JPH11327982A
CLAIM 14
【請求項14】 e.あるプロセッサをバックアップ (second set, first set, data set, first data set, second data set, second intermediate data set, output data set, intermediate data set) マ スタプロセッサに指定するステップと、 f.前記バックアップマスタプロセッサに関連づけられ たジャーナルを作成するステップと、 g.前記バックアップマスタプロセッサに関連づけられ たジャーナルに、前記マスタプロセッサに関連づけられ たジャーナルに記録されている情報を記録するステップ とをさらに有することを特徴とする請求項6に記載の方 法。

JPH11327982A
CLAIM 39
【請求項39】 各プロセッサがデータベースに関連づ けられた分散プロセッサネットワークにおいて、 各プロセッサに関連づけられたデータベースを更新する マスタプロセッサと、 前記マスタプロセッサによって送信されたデータベース 更新を記録する複数のスレーブプロセッサと、 各スレーブプロセッサに関連づけられた、第1タイマウ ィンドウを有する第1 (second set, first set, data set, first data set, second data set, second intermediate data set, output data set, intermediate data set) タイマと、 各スレーブプロセッサに関連づけられた、第2タイマウ ィンドウを有する第2タイマと、 前記マスタプロセッサによって生成されたデータベース 更新トランザクションのステップを記録するために、前 記マスタプロセッサに関連づけられたジャーナルと、 前記スレーブプロセッサによって実行された最後に完了 したデータベース更新トランザクションの指標を記憶す るメモリ手段とからなることを特徴とする分散プロセッ サネットワーク。

US8190610B2
CLAIM 43
. The computer system of claim 42 , wherein all of the key-value pairs of the first and second intermediate data set (有する第1, アップ) are provided to all of the reducers .
JPH11327982A
CLAIM 14
【請求項14】 e.あるプロセッサをバックアップ (second set, first set, data set, first data set, second data set, second intermediate data set, output data set, intermediate data set) マ スタプロセッサに指定するステップと、 f.前記バックアップマスタプロセッサに関連づけられ たジャーナルを作成するステップと、 g.前記バックアップマスタプロセッサに関連づけられ たジャーナルに、前記マスタプロセッサに関連づけられ たジャーナルに記録されている情報を記録するステップ とをさらに有することを特徴とする請求項6に記載の方 法。

JPH11327982A
CLAIM 39
【請求項39】 各プロセッサがデータベースに関連づ けられた分散プロセッサネットワークにおいて、 各プロセッサに関連づけられたデータベースを更新する マスタプロセッサと、 前記マスタプロセッサによって送信されたデータベース 更新を記録する複数のスレーブプロセッサと、 各スレーブプロセッサに関連づけられた、第1タイマウ ィンドウを有する第1 (second set, first set, data set, first data set, second data set, second intermediate data set, output data set, intermediate data set) タイマと、 各スレーブプロセッサに関連づけられた、第2タイマウ ィンドウを有する第2タイマと、 前記マスタプロセッサによって生成されたデータベース 更新トランザクションのステップを記録するために、前 記マスタプロセッサに関連づけられたジャーナルと、 前記スレーブプロセッサによって実行された最後に完了 したデータベース更新トランザクションの指標を記憶す るメモリ手段とからなることを特徴とする分散プロセッ サネットワーク。




US8190610B2

Filed: 2006-10-05     Issued: 2012-05-29

MapReduce for distributed database processing

(Original Assignee) Yahoo Inc     (Current Assignee) R2 Solutions LLC ; Altaba Inc

Ali Dasdan, Hung-Chih Yang, Ruey-Lung Hsiao
US6101495A

Filed: 1998-09-15     Issued: 2000-08-08

Method of executing partition operations in a parallel database system

(Original Assignee) Hitachi Ltd     (Current Assignee) Hitachi Ltd

Masashi Tsuchida, Kazuo Masai, Shunichi Torii
US8190610B2
CLAIM 1
. A method of processing data of a data set over a distributed system , wherein the data set comprises a plurality of data groups , the method comprising : partitioning the data of each one of the data groups into a plurality of data partitions that each have a plurality of key-value pairs and providing each data partition (new partitions) to a selected one of a plurality of mapping functions that are each user-configurable to independently output a plurality of lists of values for each of a set of keys found in such map function' ;

s corresponding data partition to form corresponding intermediate data for that data group and identifiable to that data group , wherein the data of a first data (new partitions) group has a different schema than the data of a second data group and the data of the first data group is mapped differently than the data of the second data group so that different lists of values are output for the corresponding different intermediate data , wherein the different schema and corresponding different intermediate data have a key in common ;

and reducing the intermediate data for the data groups to at least one output data group , including processing the intermediate data for each data group in a manner that is defined to correspond to that data group , so as to result in a merging of the corresponding different intermediate data based on the key in common , wherein the mapping and reducing operations are performed by a distributed system .
US6101495A
CLAIM 15
. A database management method according to claim 14 , wherein said partition operation adding a partition creates two new partitions (first data, data partition, s corresponding data partition) from one previous partition .

US6101495A
CLAIM 19
. A database management method according to claim 11 , wherein a specific value (first data group) of said key range is indicated by one of an upper bound , a lower bound , a range of value , and a number of data storage areas , of data to be stored in a data storage area .

US8190610B2
CLAIM 17
. A computer system including a plurality of computing devices , the computer system configured to process data of a data set , wherein the data set comprises a plurality of data groups , the computer system comprises at least one processor and memory that are operable to perform the following operations : partitioning the data of each one of the data groups into a plurality of data partitions that each have a plurality of key-value pairs and providing each data partition (new partitions) to a selected one of a plurality of mapping functions that are each user-configurable to independently output a plurality of lists of values for each of a set of keys found in such map function' ;

s corresponding data partition to form corresponding intermediate data for that data group and identifiable to that data group , wherein the data of a first data (new partitions) group has a different schema than the data of a second data group and the data of the first data group is mapped differently than the data of the second data group so that different lists of values are output for the corresponding different intermediate data , wherein the different schema and corresponding different intermediate data have a key in common ;

and reduce the intermediate data for the data groups to at least one output data group , including processing the intermediate data for each data group in a manner that is defined to correspond to that data group so as to result in a merging of the corresponding different intermediate data based on the key in common .
US6101495A
CLAIM 15
. A database management method according to claim 14 , wherein said partition operation adding a partition creates two new partitions (first data, data partition, s corresponding data partition) from one previous partition .

US6101495A
CLAIM 19
. A database management method according to claim 11 , wherein a specific value (first data group) of said key range is indicated by one of an upper bound , a lower bound , a range of value , and a number of data storage areas , of data to be stored in a data storage area .

US8190610B2
CLAIM 33
. A map-reduce method of processing data from a plurality of groups having different schema over a computer system , the method comprising : for a first data (new partitions) set having a plurality of first key-value pairs , wherein such first data set belongs to a first data group and the first key-value pairs have a first schema , partitioning the first data set into a plurality of data partitions and providing each data partition (new partitions) to a selected one of a plurality of mapping functions that are each user-configurable to independently output a plurality of lists of values for each of a set of keys found in such map function' ;

s corresponding data partition to form a first intermediate data set having a first set of resulting key-value pairs ;

for a second data set having a plurality of second key-value pairs , wherein such second data set belongs to a second data group and the second key-value pairs have a second schema , partitioning the first data set into a plurality of data partitions and providing each data partition to a selected one of a plurality of mapping functions to form a second intermediate data set having a second set (current data) of resulting key-value pairs differing from the first set of resulting key-value pairs , wherein the first schema differs from the second schema , wherein the first and second schema and the first and second set of resulting key-value pairs have a key in common ;

and reducing the first and the second intermediate data set together so as to form an output data set , wherein the reducing is accomplished by iterating on at least one key from the first and second intermediate data set , and the output data set has a different schema than the first and second schema , wherein the mapping and reducing operations are performed by a distributed system .
US6101495A
CLAIM 1
. A database divisional management method of dividing a database in to plural partitions and storing the plural partitions in a storage medium , the method comprising the steps of : partitioning a key range of data to be stored in said database , into a plurality of sub-key ranges ;
setting a plurality of data storage areas in said storage medium , each one of said plurality of data storage areas corresponding to one of said plurality of sub-key ranges ;
calculating a key value corresponding to said data to be stored in said database , when said data is to be stored in said database ;
and storing said data in said storage areas corresponding to said key value ;
closing , when online processing is in progress , the key range of a database table corresponding to the data storage area to be added in a case where a data storage area is to be added ;
newly assigning a data storage area ;
succeeding a lock information and said directory information ;
rewriting said dictionary information necessary for controlling apportioning of data storage areas ;
removing data from the current data (second set, value pairs) storage area to the newly assigned data storage area ;
and releasing the closing of said key range , when online processing is in progress .

US6101495A
CLAIM 15
. A database management method according to claim 14 , wherein said partition operation adding a partition creates two new partitions (first data, data partition, s corresponding data partition) from one previous partition .

US6101495A
CLAIM 19
. A database management method according to claim 11 , wherein a specific value (first data group) of said key range is indicated by one of an upper bound , a lower bound , a range of value , and a number of data storage areas , of data to be stored in a data storage area .

US8190610B2
CLAIM 40
. A computer system including a plurality of computing devices , the computer system configured to process data of a data set , wherein the data set comprises a plurality of data groups , the computer system comprises at least one processor and memory that are operable to perform the following operations : for a first data (new partitions) set having a plurality of first key-value pairs , wherein such first data set belongs to a first data group and the first key-value pairs have a first schema , partitioning the first data set into a plurality of data partitions and providing each data partition (new partitions) to a selected one of a plurality of mapping functions that are each user-configurable to independently output a plurality of lists of values for each of a set of keys found in such map function' ;

s corresponding data partition to form a first intermediate data set having a first set of resulting key-value pairs ;

for a second data set having a plurality of second key-value pairs , wherein such second data set belongs to a second data group and the second key-value pairs have a second schema , partitioning the first data set into a plurality of data partitions and providing each data partition to a selected one of a plurality of mapping functions to form a second intermediate data set having a second set (current data) of resulting key-value pairs differing from the first set of resulting key-value pairs , wherein the first schema differs from the second schema ;

and reducing the first and the second intermediate data set together so as to form an output data set , wherein the reducing is accomplished by iterating on at least one key from the first and second intermediate data set , and the output data set has a different schema than the first and second schema , wherein the first and second schema and the first and second set of resulting key-value pairs have a key in common .
US6101495A
CLAIM 1
. A database divisional management method of dividing a database in to plural partitions and storing the plural partitions in a storage medium , the method comprising the steps of : partitioning a key range of data to be stored in said database , into a plurality of sub-key ranges ;
setting a plurality of data storage areas in said storage medium , each one of said plurality of data storage areas corresponding to one of said plurality of sub-key ranges ;
calculating a key value corresponding to said data to be stored in said database , when said data is to be stored in said database ;
and storing said data in said storage areas corresponding to said key value ;
closing , when online processing is in progress , the key range of a database table corresponding to the data storage area to be added in a case where a data storage area is to be added ;
newly assigning a data storage area ;
succeeding a lock information and said directory information ;
rewriting said dictionary information necessary for controlling apportioning of data storage areas ;
removing data from the current data (second set, value pairs) storage area to the newly assigned data storage area ;
and releasing the closing of said key range , when online processing is in progress .

US6101495A
CLAIM 15
. A database management method according to claim 14 , wherein said partition operation adding a partition creates two new partitions (first data, data partition, s corresponding data partition) from one previous partition .

US6101495A
CLAIM 19
. A database management method according to claim 11 , wherein a specific value (first data group) of said key range is indicated by one of an upper bound , a lower bound , a range of value , and a number of data storage areas , of data to be stored in a data storage area .




US8190610B2

Filed: 2006-10-05     Issued: 2012-05-29

MapReduce for distributed database processing

(Original Assignee) Yahoo Inc     (Current Assignee) R2 Solutions LLC ; Altaba Inc

Ali Dasdan, Hung-Chih Yang, Ruey-Lung Hsiao
US6405198B1

Filed: 1998-09-04     Issued: 2002-06-11

Complex data query support in a partitioned database system

(Original Assignee) International Business Machines Corp     (Current Assignee) International Business Machines Corp

Roger Georges Bitar, Jean Chen Ho, Jing-Song Jang, Erik Allan Kane, James Louis Keesey, Angela Go Reyda, Gerald Johann Wilmot
US8190610B2
CLAIM 1
. A method of processing data of a data set over a distributed system , wherein the data set comprises a plurality of data groups , the method comprising : partitioning the data of each one of the data groups into a plurality of data partitions that each have a plurality of key-value pairs and providing each data partition to a selected one of a plurality of mapping functions that are each user-configurable to independently output a plurality of lists of values for each of a set of keys found in such map function' ;

s corresponding data partition to form corresponding intermediate data for that data group and identifiable to that data group , wherein the data of a first data group has a different schema (data object) than the data of a second data group and the data of the first data group is mapped differently than the data of the second data group so that different lists of values are output for the corresponding different intermediate data , wherein the different schema and corresponding different intermediate data have a key in common ;

and reducing the intermediate data for the data groups to at least one output data group , including processing the intermediate data for each data group in a manner that is defined to correspond to that data group , so as to result in a merging of the corresponding different intermediate data based on the key in common , wherein the mapping and reducing operations are performed by a distributed system .
US6405198B1
CLAIM 3
. A computer-executed method for entering information concerning a complex data type into a partitioned database of a relational database processing system , the database including one or more metadata tables for extending a relation of the database , comprising : receiving a complex data object (different schema) for storage in the database ;
determining a location of a partition of the database in which the object is to be entered ;
entering the object in the partition ;
and storing relational extender metadata information for the object at the location .

US8190610B2
CLAIM 12
. The method of claim 5 , wherein : the intermediate data includes a plurality of grouped sets of key/value pairs ;

the reducing step is carried out by a plurality of reducers ;

for at least one of the reducers , the iterator corresponding to a particular data group , for that reducer , operates according to a different key of a different schema (data object) than the iterator corresponding to another particular data group , for that reducer .
US6405198B1
CLAIM 3
. A computer-executed method for entering information concerning a complex data type into a partitioned database of a relational database processing system , the database including one or more metadata tables for extending a relation of the database , comprising : receiving a complex data object (different schema) for storage in the database ;
determining a location of a partition of the database in which the object is to be entered ;
entering the object in the partition ;
and storing relational extender metadata information for the object at the location .

US8190610B2
CLAIM 17
. A computer system including a plurality of computing devices , the computer system configured to process data of a data set , wherein the data set comprises a plurality of data groups , the computer system comprises at least one processor and memory that are operable to perform the following operations : partitioning the data of each one of the data groups into a plurality of data partitions that each have a plurality of key-value pairs and providing each data partition to a selected one of a plurality of mapping functions that are each user-configurable to independently output a plurality of lists of values for each of a set of keys found in such map function' ;

s corresponding data partition to form corresponding intermediate data for that data group and identifiable to that data group , wherein the data of a first data group has a different schema (data object) than the data of a second data group and the data of the first data group is mapped differently than the data of the second data group so that different lists of values are output for the corresponding different intermediate data , wherein the different schema and corresponding different intermediate data have a key in common ;

and reduce the intermediate data for the data groups to at least one output data group , including processing the intermediate data for each data group in a manner that is defined to correspond to that data group so as to result in a merging of the corresponding different intermediate data based on the key in common .
US6405198B1
CLAIM 3
. A computer-executed method for entering information concerning a complex data type into a partitioned database of a relational database processing system , the database including one or more metadata tables for extending a relation of the database , comprising : receiving a complex data object (different schema) for storage in the database ;
determining a location of a partition of the database in which the object is to be entered ;
entering the object in the partition ;
and storing relational extender metadata information for the object at the location .

US8190610B2
CLAIM 22
. The computer system of claim 21 , wherein : the intermediate data includes a plurality of grouped sets of key/value pairs ;

the at least one processor and memory are further operable to reduce the intermediate data via a plurality of reducers ;

for at least one of the reducers , the iterator corresponding to a particular data group , for that reducer , is configured to operate according to a different key of a different schema (data object) than the iterator corresponding to another particular data group , for that reducer .
US6405198B1
CLAIM 3
. A computer-executed method for entering information concerning a complex data type into a partitioned database of a relational database processing system , the database including one or more metadata tables for extending a relation of the database , comprising : receiving a complex data object (different schema) for storage in the database ;
determining a location of a partition of the database in which the object is to be entered ;
entering the object in the partition ;
and storing relational extender metadata information for the object at the location .

US8190610B2
CLAIM 33
. A map-reduce method of processing data from a plurality of groups having different schema (data object) over a computer system , the method comprising : for a first data set having a plurality of first key-value pairs , wherein such first data set belongs to a first data group and the first key-value pairs have a first schema , partitioning the first data set into a plurality of data partitions and providing each data partition to a selected one of a plurality of mapping functions that are each user-configurable to independently output a plurality of lists of values for each of a set of keys found in such map function' ;

s corresponding data partition to form a first intermediate data set having a first set of resulting key-value pairs ;

for a second data set having a plurality of second key-value pairs , wherein such second data set belongs to a second data group and the second key-value pairs have a second schema , partitioning the first data set into a plurality of data partitions and providing each data partition to a selected one of a plurality of mapping functions to form a second intermediate data set having a second set (including one) of resulting key-value pairs differing from the first set of resulting key-value pairs , wherein the first schema differs from the second schema , wherein the first and second schema and the first and second set of resulting key-value pairs have a key in common ;

and reducing the first and the second intermediate data set together so as to form an output data set , wherein the reducing is accomplished by iterating on at least one key from the first and second intermediate data set , and the output data set has a different schema than the first and second schema , wherein the mapping and reducing operations are performed by a distributed system .
US6405198B1
CLAIM 1
. A computer-executed method for changing a relational table in a partitioned relational database system , the relational table having at least one column defined for a complex data type , the system including one (second set) or more metadata tables for extending a relation defined by the relational table , comprising : dividing the relational table into partitions ;
storing each partition at a respective node of the system ;
and for each partition , storing rows of a metadata table that are associated with contents of the partition at the node where the partition is stored .

US6405198B1
CLAIM 3
. A computer-executed method for entering information concerning a complex data type into a partitioned database of a relational database processing system , the database including one or more metadata tables for extending a relation of the database , comprising : receiving a complex data object (different schema) for storage in the database ;
determining a location of a partition of the database in which the object is to be entered ;
entering the object in the partition ;
and storing relational extender metadata information for the object at the location .

US8190610B2
CLAIM 40
. A computer system including a plurality of computing devices , the computer system configured to process data of a data set , wherein the data set comprises a plurality of data groups , the computer system comprises at least one processor and memory that are operable to perform the following operations : for a first data set having a plurality of first key-value pairs , wherein such first data set belongs to a first data group and the first key-value pairs have a first schema , partitioning the first data set into a plurality of data partitions and providing each data partition to a selected one of a plurality of mapping functions that are each user-configurable to independently output a plurality of lists of values for each of a set of keys found in such map function' ;

s corresponding data partition to form a first intermediate data set having a first set of resulting key-value pairs ;

for a second data set having a plurality of second key-value pairs , wherein such second data set belongs to a second data group and the second key-value pairs have a second schema , partitioning the first data set into a plurality of data partitions and providing each data partition to a selected one of a plurality of mapping functions to form a second intermediate data set having a second set (including one) of resulting key-value pairs differing from the first set of resulting key-value pairs , wherein the first schema differs from the second schema ;

and reducing the first and the second intermediate data set together so as to form an output data set , wherein the reducing is accomplished by iterating on at least one key from the first and second intermediate data set , and the output data set has a different schema (data object) than the first and second schema , wherein the first and second schema and the first and second set of resulting key-value pairs have a key in common .
US6405198B1
CLAIM 1
. A computer-executed method for changing a relational table in a partitioned relational database system , the relational table having at least one column defined for a complex data type , the system including one (second set) or more metadata tables for extending a relation defined by the relational table , comprising : dividing the relational table into partitions ;
storing each partition at a respective node of the system ;
and for each partition , storing rows of a metadata table that are associated with contents of the partition at the node where the partition is stored .

US6405198B1
CLAIM 3
. A computer-executed method for entering information concerning a complex data type into a partitioned database of a relational database processing system , the database including one or more metadata tables for extending a relation of the database , comprising : receiving a complex data object (different schema) for storage in the database ;
determining a location of a partition of the database in which the object is to be entered ;
entering the object in the partition ;
and storing relational extender metadata information for the object at the location .




US8190610B2

Filed: 2006-10-05     Issued: 2012-05-29

MapReduce for distributed database processing

(Original Assignee) Yahoo Inc     (Current Assignee) R2 Solutions LLC ; Altaba Inc

Ali Dasdan, Hung-Chih Yang, Ruey-Lung Hsiao
US6278989B1

Filed: 1998-08-25     Issued: 2001-08-21

Histogram construction using adaptive random sampling with cross-validation for database systems

(Original Assignee) Microsoft Corp     (Current Assignee) Microsoft Technology Licensing LLC

Surajit Chaudhuri, Rajeev Motwani, Vivek Narasayya
US8190610B2
CLAIM 1
. A method of processing data (square root) of a data set over a distributed system , wherein the data set comprises a plurality of data groups , the method comprising : partitioning the data of each one of the data groups into a plurality of data partitions that each have a plurality of key-value pairs and providing each data partition to a selected one of a plurality of mapping functions that are each user-configurable to independently output a plurality of lists of values for each of a set of keys found in such map function' ;

s corresponding data partition to form corresponding intermediate data for that data group and identifiable to that data group , wherein the data of a first data (repeating step) group has a different schema than the data of a second data group (repeating step) and the data of the first data group is mapped differently than the data of the second data group so that different lists of values are output for the corresponding different intermediate data , wherein the different schema and corresponding different intermediate data have a key in common ;

and reducing the intermediate data for the data groups to at least one output data group , including processing the intermediate data for each data group in a manner that is defined to correspond to that data group , so as to result in a merging of the corresponding different intermediate data based on the key in common , wherein the mapping and reducing operations are performed by a distributed system .
US6278989B1
CLAIM 6
. The method of claim 1 , wherein the determining step (c) comprises the steps of : (i) partitioning the additional sample of data values over the histogram , (ii) determining an error in distribution of the additional sample of data values over the histogram , (iii) updating the histogram using the additional sample of data values , and (iv) repeating step (first data, first data group, second data group) s (i) , (ii) , and (iii) until the error in distribution of the additional sample of data values over the histogram is less than or equal to a predetermined threshold indicating the histogram is within the predetermined degree of accuracy .

US6278989B1
CLAIM 11
. The method of claim 1 , wherein the creating step (a) comprises the step of obtaining as the initial sample a random sample of data values from one relation of data of the database approximately linearly proportional in number to the square root (processing data, computer system) of the total number of data tuples from the one relation of data of the database .

US8190610B2
CLAIM 8
. The method of claim 7 , wherein : in the partitioning step (partitioning step) , all of the key/value pairs of the intermediate data are provided to all of the partitions .
US6278989B1
CLAIM 7
. The method of claim 6 , wherein the partitioning step (partitioning step) (c)(i) comprises the step of partitioning the data values of the additional sample into a plurality of bins in accordance with the histogram ;
and wherein the determining step (c)(ii) comprises the steps of determining for each bin the difference in the number of data values of the additional sample in the bin from a predetermined number of data values designated for the bin and determining the error in distribution based on the determined differences .

US8190610B2
CLAIM 13
. The method of claim 1 , wherein : the intermediate data processing step of the reducing step further comprises processing data (square root) that is not intermediate data .
US6278989B1
CLAIM 11
. The method of claim 1 , wherein the creating step (a) comprises the step of obtaining as the initial sample a random sample of data values from one relation of data of the database approximately linearly proportional in number to the square root (processing data, computer system) of the total number of data tuples from the one relation of data of the database .

US8190610B2
CLAIM 17
. A computer system (square root) including a plurality of computing devices , the computer system configured to process data of a data set , wherein the data set comprises a plurality of data groups , the computer system comprises at least one processor and memory that are operable to perform the following operations : partitioning the data of each one of the data groups into a plurality of data partitions that each have a plurality of key-value pairs and providing each data partition to a selected one of a plurality of mapping functions that are each user-configurable to independently output a plurality of lists of values for each of a set of keys found in such map function' ;

s corresponding data partition to form corresponding intermediate data for that data group and identifiable to that data group , wherein the data of a first data (repeating step) group has a different schema than the data of a second data group (repeating step) and the data of the first data group is mapped differently than the data of the second data group so that different lists of values are output for the corresponding different intermediate data , wherein the different schema and corresponding different intermediate data have a key in common ;

and reduce the intermediate data for the data groups to at least one output data group , including processing the intermediate data for each data group in a manner that is defined to correspond to that data group so as to result in a merging of the corresponding different intermediate data based on the key in common .
US6278989B1
CLAIM 6
. The method of claim 1 , wherein the determining step (c) comprises the steps of : (i) partitioning the additional sample of data values over the histogram , (ii) determining an error in distribution of the additional sample of data values over the histogram , (iii) updating the histogram using the additional sample of data values , and (iv) repeating step (first data, first data group, second data group) s (i) , (ii) , and (iii) until the error in distribution of the additional sample of data values over the histogram is less than or equal to a predetermined threshold indicating the histogram is within the predetermined degree of accuracy .

US6278989B1
CLAIM 11
. The method of claim 1 , wherein the creating step (a) comprises the step of obtaining as the initial sample a random sample of data values from one relation of data of the database approximately linearly proportional in number to the square root (processing data, computer system) of the total number of data tuples from the one relation of data of the database .

US8190610B2
CLAIM 18
. The computer system (square root) of claim 17 , wherein : the at least one output data group is a plurality of output data groups .
US6278989B1
CLAIM 11
. The method of claim 1 , wherein the creating step (a) comprises the step of obtaining as the initial sample a random sample of data values from one relation of data of the database approximately linearly proportional in number to the square root (processing data, computer system) of the total number of data tuples from the one relation of data of the database .

US8190610B2
CLAIM 19
. The computer system (square root) of claim 17 , wherein : corresponding intermediate data for a data group being identifiable to the data group includes the corresponding intermediate data being stored such that metadata associated with the manner of storage is identifiable to the data group .
US6278989B1
CLAIM 11
. The method of claim 1 , wherein the creating step (a) comprises the step of obtaining as the initial sample a random sample of data values from one relation of data of the database approximately linearly proportional in number to the square root (processing data, computer system) of the total number of data tuples from the one relation of data of the database .

US8190610B2
CLAIM 20
. The computer system (square root) of claim 19 , wherein : corresponding intermediate data for a data group being identifiable to the data group includes the computer system being configured to store the corresponding intermediate data in association with data that is identifiable to the data group .
US6278989B1
CLAIM 11
. The method of claim 1 , wherein the creating step (a) comprises the step of obtaining as the initial sample a random sample of data values from one relation of data of the database approximately linearly proportional in number to the square root (processing data, computer system) of the total number of data tuples from the one relation of data of the database .

US8190610B2
CLAIM 21
. The computer system (square root) of claim 17 , wherein : the at least one processor and memory being configured to process the intermediate data for each data group in a manner that is defined to correspond to that data group includes , for each data group , the at least one processor and memory being configured to employ an iterator that corresponds to that data group .
US6278989B1
CLAIM 11
. The method of claim 1 , wherein the creating step (a) comprises the step of obtaining as the initial sample a random sample of data values from one relation of data of the database approximately linearly proportional in number to the square root (processing data, computer system) of the total number of data tuples from the one relation of data of the database .

US8190610B2
CLAIM 22
. The computer system (square root) of claim 21 , wherein : the intermediate data includes a plurality of grouped sets of key/value pairs ;

the at least one processor and memory are further operable to reduce the intermediate data via a plurality of reducers ;

for at least one of the reducers , the iterator corresponding to a particular data group , for that reducer , is configured to operate according to a different key of a different schema than the iterator corresponding to another particular data group , for that reducer .
US6278989B1
CLAIM 11
. The method of claim 1 , wherein the creating step (a) comprises the step of obtaining as the initial sample a random sample of data values from one relation of data of the database approximately linearly proportional in number to the square root (processing data, computer system) of the total number of data tuples from the one relation of data of the database .

US8190610B2
CLAIM 23
. The computer system (square root) of claim 17 , wherein : the intermediate data includes a plurality of grouped sets of key/value pairs ;

the at least one processor and memory are further operable to reduce the intermediate data via a plurality of reducers ;

and the at least one processor and memory are further operable to : partition the intermediate data into a plurality of partitions , to provide each key/value pair of the intermediate data to a separate one of the partitions ;

and provide the intermediate data of each partition to a separate one of the reducers .
US6278989B1
CLAIM 11
. The method of claim 1 , wherein the creating step (a) comprises the step of obtaining as the initial sample a random sample of data values from one relation of data of the database approximately linearly proportional in number to the square root (processing data, computer system) of the total number of data tuples from the one relation of data of the database .

US8190610B2
CLAIM 24
. The computer system (square root) of claim 17 , wherein : the intermediate data includes a plurality of grouped sets of key/value pairs ;

the at least one processor and memory are further operable to reduce the intermediate data via a plurality of reducers ;

and the at least one processor and memory are further operable to : partition the intermediate data into a plurality of partitions , to provide at least some of the key/value pairs of the intermediate data to more than one of the partitions ;

and provide the intermediate data of each partition to a separate one of the reducers .
US6278989B1
CLAIM 11
. The method of claim 1 , wherein the creating step (a) comprises the step of obtaining as the initial sample a random sample of data values from one relation of data of the database approximately linearly proportional in number to the square root (processing data, computer system) of the total number of data tuples from the one relation of data of the database .

US8190610B2
CLAIM 25
. The computer system (square root) of claim 24 , wherein : the at least one processor and memory are further operable to , in partitioning , to provide all of the key/value pairs of the intermediate data to all of the partitions .
US6278989B1
CLAIM 11
. The method of claim 1 , wherein the creating step (a) comprises the step of obtaining as the initial sample a random sample of data values from one relation of data of the database approximately linearly proportional in number to the square root (processing data, computer system) of the total number of data tuples from the one relation of data of the database .

US8190610B2
CLAIM 26
. The computer system (square root) of claim 24 , wherein : at least some of the reducers include a sort , group-by-key and combine task ;

the at least one processor and memory are further operable to generate and provide metadata for at least some of the mapping , partitioning , combining , grouping and sorting .
US6278989B1
CLAIM 11
. The method of claim 1 , wherein the creating step (a) comprises the step of obtaining as the initial sample a random sample of data values from one relation of data of the database approximately linearly proportional in number to the square root (processing data, computer system) of the total number of data tuples from the one relation of data of the database .

US8190610B2
CLAIM 27
. The computer system (square root) of claim 26 , wherein : the reducing includes processing the metadata .
US6278989B1
CLAIM 11
. The method of claim 1 , wherein the creating step (a) comprises the step of obtaining as the initial sample a random sample of data values from one relation of data of the database approximately linearly proportional in number to the square root (processing data, computer system) of the total number of data tuples from the one relation of data of the database .

US8190610B2
CLAIM 28
. The computer system (square root) of claim 27 , wherein : the at least one processor and memory being further operable to process the intermediate data for each data group in a manner that is defined to correspond to that data group includes , for each data group , the at least one processor and memory being further operable to employ an iterator that corresponds to that data group , wherein the iterator is configured to provide the associated metadata to the reducing .
US6278989B1
CLAIM 11
. The method of claim 1 , wherein the creating step (a) comprises the step of obtaining as the initial sample a random sample of data values from one relation of data of the database approximately linearly proportional in number to the square root (processing data, computer system) of the total number of data tuples from the one relation of data of the database .

US8190610B2
CLAIM 29
. The computer system (square root) of claim 17 , wherein : the intermediate data processing of the reducing further comprises processing data (square root) that is not intermediate data .
US6278989B1
CLAIM 11
. The method of claim 1 , wherein the creating step (a) comprises the step of obtaining as the initial sample a random sample of data values from one relation of data of the database approximately linearly proportional in number to the square root (processing data, computer system) of the total number of data tuples from the one relation of data of the database .

US8190610B2
CLAIM 30
. The computer system (square root) of claim 29 , wherein : the at least one processor and memory are further operable to reduce the intermediate data via a plurality of reducers ;

and the data that is not intermediate data , for a particular reducer , includes data that is associated with another reducer .
US6278989B1
CLAIM 11
. The method of claim 1 , wherein the creating step (a) comprises the step of obtaining as the initial sample a random sample of data values from one relation of data of the database approximately linearly proportional in number to the square root (processing data, computer system) of the total number of data tuples from the one relation of data of the database .

US8190610B2
CLAIM 31
. The computer system (square root) of claim 29 , wherein : the at least one processor and memory are further operable to reduce the intermediate data via a plurality of reducers ;

and the data that is not intermediate data , for a particular reducer , includes data that is associated with that reducer .
US6278989B1
CLAIM 11
. The method of claim 1 , wherein the creating step (a) comprises the step of obtaining as the initial sample a random sample of data values from one relation of data of the database approximately linearly proportional in number to the square root (processing data, computer system) of the total number of data tuples from the one relation of data of the database .

US8190610B2
CLAIM 32
. The computer system (square root) of claim 17 , wherein the at least one processor and memory are further operable to reduce by relating the data among the plurality of data groups .
US6278989B1
CLAIM 11
. The method of claim 1 , wherein the creating step (a) comprises the step of obtaining as the initial sample a random sample of data values from one relation of data of the database approximately linearly proportional in number to the square root (processing data, computer system) of the total number of data tuples from the one relation of data of the database .

US8190610B2
CLAIM 33
. A map-reduce method of processing data (square root) from a plurality of groups having different schema over a computer system (square root) , the method comprising : for a first data (repeating step) set having a plurality of first key-value pairs , wherein such first data set belongs to a first data group and the first key-value pairs have a first schema , partitioning the first data set into a plurality of data partitions and providing each data partition to a selected one of a plurality of mapping functions that are each user-configurable to independently output a plurality of lists of values for each of a set of keys found in such map function' ;

s corresponding data partition to form a first intermediate data set having a first set of resulting key-value pairs ;

for a second data set having a plurality of second key-value pairs , wherein such second data set belongs to a second data group (repeating step) and the second key-value pairs have a second schema , partitioning the first data set into a plurality of data partitions and providing each data partition to a selected one of a plurality of mapping functions to form a second intermediate data set having a second set of resulting key-value pairs differing from the first set of resulting key-value pairs , wherein the first schema differs from the second schema , wherein the first and second schema and the first and second set of resulting key-value pairs have a key in common ;

and reducing the first and the second intermediate data set together so as to form an output data set , wherein the reducing is accomplished by iterating on at least one key from the first and second intermediate data set , and the output data set has a different schema than the first and second schema , wherein the mapping and reducing operations are performed by a distributed system .
US6278989B1
CLAIM 6
. The method of claim 1 , wherein the determining step (c) comprises the steps of : (i) partitioning the additional sample of data values over the histogram , (ii) determining an error in distribution of the additional sample of data values over the histogram , (iii) updating the histogram using the additional sample of data values , and (iv) repeating step (first data, first data group, second data group) s (i) , (ii) , and (iii) until the error in distribution of the additional sample of data values over the histogram is less than or equal to a predetermined threshold indicating the histogram is within the predetermined degree of accuracy .

US6278989B1
CLAIM 11
. The method of claim 1 , wherein the creating step (a) comprises the step of obtaining as the initial sample a random sample of data values from one relation of data of the database approximately linearly proportional in number to the square root (processing data, computer system) of the total number of data tuples from the one relation of data of the database .

US8190610B2
CLAIM 40
. A computer system (square root) including a plurality of computing devices , the computer system configured to process data of a data set , wherein the data set comprises a plurality of data groups , the computer system comprises at least one processor and memory that are operable to perform the following operations : for a first data (repeating step) set having a plurality of first key-value pairs , wherein such first data set belongs to a first data group and the first key-value pairs have a first schema , partitioning the first data set into a plurality of data partitions and providing each data partition to a selected one of a plurality of mapping functions that are each user-configurable to independently output a plurality of lists of values for each of a set of keys found in such map function' ;

s corresponding data partition to form a first intermediate data set having a first set of resulting key-value pairs ;

for a second data set having a plurality of second key-value pairs , wherein such second data set belongs to a second data group (repeating step) and the second key-value pairs have a second schema , partitioning the first data set into a plurality of data partitions and providing each data partition to a selected one of a plurality of mapping functions to form a second intermediate data set having a second set of resulting key-value pairs differing from the first set of resulting key-value pairs , wherein the first schema differs from the second schema ;

and reducing the first and the second intermediate data set together so as to form an output data set , wherein the reducing is accomplished by iterating on at least one key from the first and second intermediate data set , and the output data set has a different schema than the first and second schema , wherein the first and second schema and the first and second set of resulting key-value pairs have a key in common .
US6278989B1
CLAIM 6
. The method of claim 1 , wherein the determining step (c) comprises the steps of : (i) partitioning the additional sample of data values over the histogram , (ii) determining an error in distribution of the additional sample of data values over the histogram , (iii) updating the histogram using the additional sample of data values , and (iv) repeating step (first data, first data group, second data group) s (i) , (ii) , and (iii) until the error in distribution of the additional sample of data values over the histogram is less than or equal to a predetermined threshold indicating the histogram is within the predetermined degree of accuracy .

US6278989B1
CLAIM 11
. The method of claim 1 , wherein the creating step (a) comprises the step of obtaining as the initial sample a random sample of data values from one relation of data of the database approximately linearly proportional in number to the square root (processing data, computer system) of the total number of data tuples from the one relation of data of the database .

US8190610B2
CLAIM 41
. The computer system (square root) of claim 40 , wherein the reducing is accomplished by iterating on at least one key from each of the first and second intermediate data set so that the output data set is a merging of a portion of the first and second intermediate data set .
US6278989B1
CLAIM 11
. The method of claim 1 , wherein the creating step (a) comprises the step of obtaining as the initial sample a random sample of data values from one relation of data of the database approximately linearly proportional in number to the square root (processing data, computer system) of the total number of data tuples from the one relation of data of the database .

US8190610B2
CLAIM 42
. The computer system (square root) of claim 40 , wherein the reducing is accomplished by a plurality of reducers , the d the at least one processor and memory further operable for : partitioning the first or second intermediate data set into a plurality of partitions , at least some of the key-value pairs of the first or second intermediate data being provided to more than one of the partitions ;

and providing the partitioned intermediate data set of each partition to a separate one of the reducers .
US6278989B1
CLAIM 11
. The method of claim 1 , wherein the creating step (a) comprises the step of obtaining as the initial sample a random sample of data values from one relation of data of the database approximately linearly proportional in number to the square root (processing data, computer system) of the total number of data tuples from the one relation of data of the database .

US8190610B2
CLAIM 43
. The computer system (square root) of claim 42 , wherein all of the key-value pairs of the first and second intermediate data set are provided to all of the reducers .
US6278989B1
CLAIM 11
. The method of claim 1 , wherein the creating step (a) comprises the step of obtaining as the initial sample a random sample of data values from one relation of data of the database approximately linearly proportional in number to the square root (processing data, computer system) of the total number of data tuples from the one relation of data of the database .

US8190610B2
CLAIM 44
. The computer system (square root) of claim 42 , wherein at least some of the reducers include a sort , group-by-key and combine task , the d the at least one processor and memory are further operable for : generating and providing metadata for at least some of the mapping , partitioning , combining , grouping and sorting .
US6278989B1
CLAIM 11
. The method of claim 1 , wherein the creating step (a) comprises the step of obtaining as the initial sample a random sample of data values from one relation of data of the database approximately linearly proportional in number to the square root (processing data, computer system) of the total number of data tuples from the one relation of data of the database .

US8190610B2
CLAIM 45
. The computer system (square root) of claim 44 , wherein the reducing includes processing the metadata .
US6278989B1
CLAIM 11
. The method of claim 1 , wherein the creating step (a) comprises the step of obtaining as the initial sample a random sample of data values from one relation of data of the database approximately linearly proportional in number to the square root (processing data, computer system) of the total number of data tuples from the one relation of data of the database .

US8190610B2
CLAIM 46
. The computer system (square root) of claim 45 , wherein iterating includes providing the associated metadata to the processing of the reducing step .
US6278989B1
CLAIM 11
. The method of claim 1 , wherein the creating step (a) comprises the step of obtaining as the initial sample a random sample of data values from one relation of data of the database approximately linearly proportional in number to the square root (processing data, computer system) of the total number of data tuples from the one relation of data of the database .




US8190610B2

Filed: 2006-10-05     Issued: 2012-05-29

MapReduce for distributed database processing

(Original Assignee) Yahoo Inc     (Current Assignee) R2 Solutions LLC ; Altaba Inc

Ali Dasdan, Hung-Chih Yang, Ruey-Lung Hsiao
US6353818B1

Filed: 1998-08-19     Issued: 2002-03-05

Plan-per-tuple optimizing of database queries with user-defined functions

(Original Assignee) NCR Corp     (Current Assignee) Teradata US Inc

Felipe Carino, Jr.
US8190610B2
CLAIM 1
. A method of processing data of a data set over a distributed system , wherein the data set comprises a plurality of data groups , the method comprising : partitioning the data of each one of the data groups into a plurality of data partitions that each have a plurality of key-value pairs and providing each data partition to a selected one of a plurality of mapping functions that are each user-configurable to independently output a plurality of lists of values for each of a set of keys found in such map function' ;

s corresponding data partition to form corresponding intermediate data for that data group and identifiable to that data group , wherein the data of a first data (communicatively couple, first data) group has a different schema (compile time) than the data of a second data (second data) group and the data of the first data group is mapped differently than the data of the second data group so that different lists of values are output for the corresponding different intermediate data , wherein the different schema and corresponding different intermediate data have a key in common ;

and reducing the intermediate data for the data groups to at least one output data group , including processing the intermediate data for each data group in a manner that is defined to correspond to that data group , so as to result in a merging of the corresponding different intermediate data based on the key in common , wherein the mapping and reducing operations are performed by a distributed system .
US6353818B1
CLAIM 2
. The method of claim 1 , wherein the step of generating a plurality of query plans is performed at a database management system compile time (different schema) , and the steps of evaluating the plurality of query plans and selecting a query plan are performed at a database management system run time .

US6353818B1
CLAIM 7
. The method of claim 1 , wherein the system resource metric is selected from the group comprising : an available processing capacity of a database management system node ;
an available non-volatile memory (first data group) for a database management system node ;
an available volatile memory capacity of a database management system node ;
and an available communications throughput capacity from a first data (first data, first data set, output data set, includes data) base management system node to a second data (second data) base management system node .

US6353818B1
CLAIM 9
. An apparatus for executing a database query in a database management system , comprising : a query plan generator for generating a plurality of query plans , each query optimized with respect to at least one system resource metric of the database management system ;
and a query plan evaluator , communicatively couple (first data, first data set, output data set, includes data) d to a global resource object and a database management system node , the query plan evaluator for evaluating and selecting a query plan from the plurality of query plans at run time according to a measured system resource metric obtained from the global resource object means for predicting at least one resource requirement for the query ;
means for determining when the predicted resource requirement exceeds a threshold value ;
and means for generating a query plan when the predicted resource does not exceed the threshold value .

US8190610B2
CLAIM 12
. The method of claim 5 , wherein : the intermediate data includes a plurality of grouped sets of key/value pairs ;

the reducing step is carried out by a plurality of reducers ;

for at least one of the reducers , the iterator corresponding to a particular data group , for that reducer , operates according to a different key of a different schema (compile time) than the iterator corresponding to another particular data group , for that reducer .
US6353818B1
CLAIM 2
. The method of claim 1 , wherein the step of generating a plurality of query plans is performed at a database management system compile time (different schema) , and the steps of evaluating the plurality of query plans and selecting a query plan are performed at a database management system run time .

US8190610B2
CLAIM 14
. The method of claim 13 , wherein : the reducing step is carried out by a plurality of reducers ;

and the data that is not intermediate data , for a particular reducer , includes data (communicatively couple, first data) that is associated with another reducer .
US6353818B1
CLAIM 7
. The method of claim 1 , wherein the system resource metric is selected from the group comprising : an available processing capacity of a database management system node ;
an available non-volatile memory for a database management system node ;
an available volatile memory capacity of a database management system node ;
and an available communications throughput capacity from a first data (first data, first data set, output data set, includes data) base management system node to a second database management system node .

US6353818B1
CLAIM 9
. An apparatus for executing a database query in a database management system , comprising : a query plan generator for generating a plurality of query plans , each query optimized with respect to at least one system resource metric of the database management system ;
and a query plan evaluator , communicatively couple (first data, first data set, output data set, includes data) d to a global resource object and a database management system node , the query plan evaluator for evaluating and selecting a query plan from the plurality of query plans at run time according to a measured system resource metric obtained from the global resource object means for predicting at least one resource requirement for the query ;
means for determining when the predicted resource requirement exceeds a threshold value ;
and means for generating a query plan when the predicted resource does not exceed the threshold value .

US8190610B2
CLAIM 15
. The method of claim 13 , wherein : the reducing step is carried out by a plurality of reducers ;

and the data that is not intermediate data , for a particular reducer , includes data (communicatively couple, first data) that is associated with that reducer .
US6353818B1
CLAIM 7
. The method of claim 1 , wherein the system resource metric is selected from the group comprising : an available processing capacity of a database management system node ;
an available non-volatile memory for a database management system node ;
an available volatile memory capacity of a database management system node ;
and an available communications throughput capacity from a first data (first data, first data set, output data set, includes data) base management system node to a second database management system node .

US6353818B1
CLAIM 9
. An apparatus for executing a database query in a database management system , comprising : a query plan generator for generating a plurality of query plans , each query optimized with respect to at least one system resource metric of the database management system ;
and a query plan evaluator , communicatively couple (first data, first data set, output data set, includes data) d to a global resource object and a database management system node , the query plan evaluator for evaluating and selecting a query plan from the plurality of query plans at run time according to a measured system resource metric obtained from the global resource object means for predicting at least one resource requirement for the query ;
means for determining when the predicted resource requirement exceeds a threshold value ;
and means for generating a query plan when the predicted resource does not exceed the threshold value .

US8190610B2
CLAIM 17
. A computer system including a plurality of computing devices , the computer system configured to process data of a data set , wherein the data set comprises a plurality of data groups , the computer system comprises at least one processor and memory that are operable to perform the following operations : partitioning the data of each one of the data groups into a plurality of data partitions that each have a plurality of key-value pairs and providing each data partition to a selected one of a plurality of mapping functions that are each user-configurable to independently output a plurality of lists of values for each of a set of keys found in such map function' ;

s corresponding data partition to form corresponding intermediate data for that data group and identifiable to that data group , wherein the data of a first data (communicatively couple, first data) group has a different schema (compile time) than the data of a second data (second data) group and the data of the first data group is mapped differently than the data of the second data group so that different lists of values are output for the corresponding different intermediate data , wherein the different schema and corresponding different intermediate data have a key in common ;

and reduce the intermediate data for the data groups to at least one output data group , including processing the intermediate data for each data group in a manner that is defined to correspond to that data group so as to result in a merging of the corresponding different intermediate data based on the key in common .
US6353818B1
CLAIM 2
. The method of claim 1 , wherein the step of generating a plurality of query plans is performed at a database management system compile time (different schema) , and the steps of evaluating the plurality of query plans and selecting a query plan are performed at a database management system run time .

US6353818B1
CLAIM 7
. The method of claim 1 , wherein the system resource metric is selected from the group comprising : an available processing capacity of a database management system node ;
an available non-volatile memory (first data group) for a database management system node ;
an available volatile memory capacity of a database management system node ;
and an available communications throughput capacity from a first data (first data, first data set, output data set, includes data) base management system node to a second data (second data) base management system node .

US6353818B1
CLAIM 9
. An apparatus for executing a database query in a database management system , comprising : a query plan generator for generating a plurality of query plans , each query optimized with respect to at least one system resource metric of the database management system ;
and a query plan evaluator , communicatively couple (first data, first data set, output data set, includes data) d to a global resource object and a database management system node , the query plan evaluator for evaluating and selecting a query plan from the plurality of query plans at run time according to a measured system resource metric obtained from the global resource object means for predicting at least one resource requirement for the query ;
means for determining when the predicted resource requirement exceeds a threshold value ;
and means for generating a query plan when the predicted resource does not exceed the threshold value .

US8190610B2
CLAIM 22
. The computer system of claim 21 , wherein : the intermediate data includes a plurality of grouped sets of key/value pairs ;

the at least one processor and memory are further operable to reduce the intermediate data via a plurality of reducers ;

for at least one of the reducers , the iterator corresponding to a particular data group , for that reducer , is configured to operate according to a different key of a different schema (compile time) than the iterator corresponding to another particular data group , for that reducer .
US6353818B1
CLAIM 2
. The method of claim 1 , wherein the step of generating a plurality of query plans is performed at a database management system compile time (different schema) , and the steps of evaluating the plurality of query plans and selecting a query plan are performed at a database management system run time .

US8190610B2
CLAIM 30
. The computer system of claim 29 , wherein : the at least one processor and memory are further operable to reduce the intermediate data via a plurality of reducers ;

and the data that is not intermediate data , for a particular reducer , includes data (communicatively couple, first data) that is associated with another reducer .
US6353818B1
CLAIM 7
. The method of claim 1 , wherein the system resource metric is selected from the group comprising : an available processing capacity of a database management system node ;
an available non-volatile memory for a database management system node ;
an available volatile memory capacity of a database management system node ;
and an available communications throughput capacity from a first data (first data, first data set, output data set, includes data) base management system node to a second database management system node .

US6353818B1
CLAIM 9
. An apparatus for executing a database query in a database management system , comprising : a query plan generator for generating a plurality of query plans , each query optimized with respect to at least one system resource metric of the database management system ;
and a query plan evaluator , communicatively couple (first data, first data set, output data set, includes data) d to a global resource object and a database management system node , the query plan evaluator for evaluating and selecting a query plan from the plurality of query plans at run time according to a measured system resource metric obtained from the global resource object means for predicting at least one resource requirement for the query ;
means for determining when the predicted resource requirement exceeds a threshold value ;
and means for generating a query plan when the predicted resource does not exceed the threshold value .

US8190610B2
CLAIM 31
. The computer system of claim 29 , wherein : the at least one processor and memory are further operable to reduce the intermediate data via a plurality of reducers ;

and the data that is not intermediate data , for a particular reducer , includes data (communicatively couple, first data) that is associated with that reducer .
US6353818B1
CLAIM 7
. The method of claim 1 , wherein the system resource metric is selected from the group comprising : an available processing capacity of a database management system node ;
an available non-volatile memory for a database management system node ;
an available volatile memory capacity of a database management system node ;
and an available communications throughput capacity from a first data (first data, first data set, output data set, includes data) base management system node to a second database management system node .

US6353818B1
CLAIM 9
. An apparatus for executing a database query in a database management system , comprising : a query plan generator for generating a plurality of query plans , each query optimized with respect to at least one system resource metric of the database management system ;
and a query plan evaluator , communicatively couple (first data, first data set, output data set, includes data) d to a global resource object and a database management system node , the query plan evaluator for evaluating and selecting a query plan from the plurality of query plans at run time according to a measured system resource metric obtained from the global resource object means for predicting at least one resource requirement for the query ;
means for determining when the predicted resource requirement exceeds a threshold value ;
and means for generating a query plan when the predicted resource does not exceed the threshold value .

US8190610B2
CLAIM 33
. A map-reduce method of processing data from a plurality of groups having different schema (compile time) over a computer system , the method comprising : for a first data (communicatively couple, first data) set having a plurality of first key-value pairs , wherein such first data set belongs to a first data group and the first key-value pairs have a first schema , partitioning the first data set into a plurality of data partitions and providing each data partition to a selected one of a plurality of mapping functions that are each user-configurable to independently output a plurality of lists of values for each of a set of keys found in such map function' ;

s corresponding data partition to form a first intermediate data set having a first set of resulting key-value pairs ;

for a second data (second data) set having a plurality of second key-value pairs , wherein such second data set belongs to a second data group and the second key-value pairs have a second schema , partitioning the first data set into a plurality of data partitions and providing each data partition to a selected one of a plurality of mapping functions to form a second intermediate data set having a second set of resulting key-value pairs differing from the first set of resulting key-value pairs , wherein the first schema differs from the second schema , wherein the first and second schema and the first and second set of resulting key-value pairs have a key in common ;

and reducing the first and the second intermediate data set together so as to form an output data set (communicatively couple, first data) , wherein the reducing is accomplished by iterating on at least one key from the first and second intermediate data set , and the output data set has a different schema than the first and second schema , wherein the mapping and reducing operations are performed by a distributed system .
US6353818B1
CLAIM 2
. The method of claim 1 , wherein the step of generating a plurality of query plans is performed at a database management system compile time (different schema) , and the steps of evaluating the plurality of query plans and selecting a query plan are performed at a database management system run time .

US6353818B1
CLAIM 7
. The method of claim 1 , wherein the system resource metric is selected from the group comprising : an available processing capacity of a database management system node ;
an available non-volatile memory (first data group) for a database management system node ;
an available volatile memory capacity of a database management system node ;
and an available communications throughput capacity from a first data (first data, first data set, output data set, includes data) base management system node to a second data (second data) base management system node .

US6353818B1
CLAIM 9
. An apparatus for executing a database query in a database management system , comprising : a query plan generator for generating a plurality of query plans , each query optimized with respect to at least one system resource metric of the database management system ;
and a query plan evaluator , communicatively couple (first data, first data set, output data set, includes data) d to a global resource object and a database management system node , the query plan evaluator for evaluating and selecting a query plan from the plurality of query plans at run time according to a measured system resource metric obtained from the global resource object means for predicting at least one resource requirement for the query ;
means for determining when the predicted resource requirement exceeds a threshold value ;
and means for generating a query plan when the predicted resource does not exceed the threshold value .

US8190610B2
CLAIM 34
. The map-reduce method of claim 33 , wherein the reducing is accomplished by iterating on at least one key from each of the first and second intermediate data set so that the output data set (communicatively couple, first data) is a merging of a portion of the first and second intermediate data set .
US6353818B1
CLAIM 7
. The method of claim 1 , wherein the system resource metric is selected from the group comprising : an available processing capacity of a database management system node ;
an available non-volatile memory for a database management system node ;
an available volatile memory capacity of a database management system node ;
and an available communications throughput capacity from a first data (first data, first data set, output data set, includes data) base management system node to a second database management system node .

US6353818B1
CLAIM 9
. An apparatus for executing a database query in a database management system , comprising : a query plan generator for generating a plurality of query plans , each query optimized with respect to at least one system resource metric of the database management system ;
and a query plan evaluator , communicatively couple (first data, first data set, output data set, includes data) d to a global resource object and a database management system node , the query plan evaluator for evaluating and selecting a query plan from the plurality of query plans at run time according to a measured system resource metric obtained from the global resource object means for predicting at least one resource requirement for the query ;
means for determining when the predicted resource requirement exceeds a threshold value ;
and means for generating a query plan when the predicted resource does not exceed the threshold value .

US8190610B2
CLAIM 40
. A computer system including a plurality of computing devices , the computer system configured to process data of a data set , wherein the data set comprises a plurality of data groups , the computer system comprises at least one processor and memory that are operable to perform the following operations : for a first data (communicatively couple, first data) set having a plurality of first key-value pairs , wherein such first data set belongs to a first data group and the first key-value pairs have a first schema , partitioning the first data set into a plurality of data partitions and providing each data partition to a selected one of a plurality of mapping functions that are each user-configurable to independently output a plurality of lists of values for each of a set of keys found in such map function' ;

s corresponding data partition to form a first intermediate data set having a first set of resulting key-value pairs ;

for a second data (second data) set having a plurality of second key-value pairs , wherein such second data set belongs to a second data group and the second key-value pairs have a second schema , partitioning the first data set into a plurality of data partitions and providing each data partition to a selected one of a plurality of mapping functions to form a second intermediate data set having a second set of resulting key-value pairs differing from the first set of resulting key-value pairs , wherein the first schema differs from the second schema ;

and reducing the first and the second intermediate data set together so as to form an output data set (communicatively couple, first data) , wherein the reducing is accomplished by iterating on at least one key from the first and second intermediate data set , and the output data set has a different schema (compile time) than the first and second schema , wherein the first and second schema and the first and second set of resulting key-value pairs have a key in common .
US6353818B1
CLAIM 2
. The method of claim 1 , wherein the step of generating a plurality of query plans is performed at a database management system compile time (different schema) , and the steps of evaluating the plurality of query plans and selecting a query plan are performed at a database management system run time .

US6353818B1
CLAIM 7
. The method of claim 1 , wherein the system resource metric is selected from the group comprising : an available processing capacity of a database management system node ;
an available non-volatile memory (first data group) for a database management system node ;
an available volatile memory capacity of a database management system node ;
and an available communications throughput capacity from a first data (first data, first data set, output data set, includes data) base management system node to a second data (second data) base management system node .

US6353818B1
CLAIM 9
. An apparatus for executing a database query in a database management system , comprising : a query plan generator for generating a plurality of query plans , each query optimized with respect to at least one system resource metric of the database management system ;
and a query plan evaluator , communicatively couple (first data, first data set, output data set, includes data) d to a global resource object and a database management system node , the query plan evaluator for evaluating and selecting a query plan from the plurality of query plans at run time according to a measured system resource metric obtained from the global resource object means for predicting at least one resource requirement for the query ;
means for determining when the predicted resource requirement exceeds a threshold value ;
and means for generating a query plan when the predicted resource does not exceed the threshold value .

US8190610B2
CLAIM 41
. The computer system of claim 40 , wherein the reducing is accomplished by iterating on at least one key from each of the first and second intermediate data set so that the output data set (communicatively couple, first data) is a merging of a portion of the first and second intermediate data set .
US6353818B1
CLAIM 7
. The method of claim 1 , wherein the system resource metric is selected from the group comprising : an available processing capacity of a database management system node ;
an available non-volatile memory for a database management system node ;
an available volatile memory capacity of a database management system node ;
and an available communications throughput capacity from a first data (first data, first data set, output data set, includes data) base management system node to a second database management system node .

US6353818B1
CLAIM 9
. An apparatus for executing a database query in a database management system , comprising : a query plan generator for generating a plurality of query plans , each query optimized with respect to at least one system resource metric of the database management system ;
and a query plan evaluator , communicatively couple (first data, first data set, output data set, includes data) d to a global resource object and a database management system node , the query plan evaluator for evaluating and selecting a query plan from the plurality of query plans at run time according to a measured system resource metric obtained from the global resource object means for predicting at least one resource requirement for the query ;
means for determining when the predicted resource requirement exceeds a threshold value ;
and means for generating a query plan when the predicted resource does not exceed the threshold value .




US8190610B2

Filed: 2006-10-05     Issued: 2012-05-29

MapReduce for distributed database processing

(Original Assignee) Yahoo Inc     (Current Assignee) R2 Solutions LLC ; Altaba Inc

Ali Dasdan, Hung-Chih Yang, Ruey-Lung Hsiao
US6223182B1

Filed: 1998-06-30     Issued: 2001-04-24

Dynamic data organization

(Original Assignee) Oracle Corp     (Current Assignee) Oracle International Corp

Nipun Agarwal, Linda Feng, Timothy Robertson
US8190610B2
CLAIM 1
. A method of processing data (square root) of a data set over a distributed system , wherein the data set comprises a plurality of data groups , the method comprising : partitioning the data of each one of the data groups into a plurality of data partitions that each have a plurality of key-value pairs and providing each data partition to a selected one of a plurality of mapping functions that are each user-configurable to independently output a plurality of lists of values for each of a set of keys found in such map function' ;

s corresponding data partition to form corresponding intermediate data for that data group and identifiable to that data group , wherein the data of a first data (first data) group has a different schema than the data of a second data group and the data of the first data group is mapped differently than the data of the second data group so that different lists of values are output for the corresponding different intermediate data , wherein the different schema and corresponding different intermediate data have a key in common ;

and reducing the intermediate data for the data groups to at least one output data group , including processing the intermediate data for each data group in a manner that is defined to correspond to that data group , so as to result in a merging of the corresponding different intermediate data based on the key in common , wherein the mapping and reducing operations are performed by a distributed system .
US6223182B1
CLAIM 1
. A method of organizing data in a data container including a plurality of records , each of said plurality of records including a plurality of fields , said method comprising the computer-implemented steps of : determining codes for corresponding records of said plurality records based on bit-interleaving values from at least two of said plurality of fields belonging to said corresponding records ;
creating a first data (first data) base object having a plurality of rows corresponding to said plurality of records and a first column for holding said codes for said corresponding records and a second column for holding a reference to said corresponding records ;
creating a second database object containing prefixes of said codes based on said first database object ;
and subdividing the data container into a plurality of subsets based on said second database object .

US6223182B1
CLAIM 6
. The method of claim 1 , wherein the step of subdividing the data container into a plurality of subsets includes the steps of : grouping entries of the second database objects into ‘p’ groups , wherein p is a square root (processing data, computer system) of a number of the entries of the second database ;
and separately joining each of the groups with the data container to form the subsets .

US8190610B2
CLAIM 12
. The method of claim 5 , wherein : the intermediate data includes a plurality of grouped sets of key/value pairs ;

the reducing step is carried out by a plurality of reducers ;

for at least one of the reducers , the iterator corresponding to a particular data group (more process) , for that reducer , operates according to a different key of a different schema than the iterator corresponding to another particular data group , for that reducer .
US6223182B1
CLAIM 10
. A computer-readable medium bearing instruction for organizing data in a data container including a plurality of records , each of said plurality of records including a plurality of fields , said instructions arranged for causing one or more process (particular data group) ors to perform the steps of : determining codes for corresponding records of said plurality records based on bit-interleaving values from at least two said plurality of fields belonging to said corresponding records ;
creating a first database object having a plurality of rows corresponding to said plurality of records and a first column for holding said codes for said corresponding records and a second column for holding a reference to said corresponding records ;
creating a second database object containing prefixes of said codes based on said first database object ;
and subdividing the data container into a plurality of subsets based on said second database object .

US8190610B2
CLAIM 13
. The method of claim 1 , wherein : the intermediate data processing step of the reducing step further comprises processing data (square root) that is not intermediate data .
US6223182B1
CLAIM 6
. The method of claim 1 , wherein the step of subdividing the data container into a plurality of subsets includes the steps of : grouping entries of the second database objects into ‘p’ groups , wherein p is a square root (processing data, computer system) of a number of the entries of the second database ;
and separately joining each of the groups with the data container to form the subsets .

US8190610B2
CLAIM 17
. A computer system (square root) including a plurality of computing devices (first column) , the computer system configured to process data of a data set , wherein the data set comprises a plurality of data groups , the computer system comprises at least one processor and memory that are operable to perform the following operations : partitioning the data of each one of the data groups into a plurality of data partitions that each have a plurality of key-value pairs and providing each data partition to a selected one of a plurality of mapping functions that are each user-configurable to independently output a plurality of lists of values for each of a set of keys found in such map function' ;

s corresponding data partition to form corresponding intermediate data for that data group and identifiable to that data group , wherein the data of a first data (first data) group has a different schema than the data of a second data group and the data of the first data group is mapped differently than the data of the second data group so that different lists of values are output for the corresponding different intermediate data , wherein the different schema and corresponding different intermediate data have a key in common ;

and reduce the intermediate data for the data groups to at least one output data group , including processing the intermediate data for each data group in a manner that is defined to correspond to that data group so as to result in a merging of the corresponding different intermediate data based on the key in common .
US6223182B1
CLAIM 1
. A method of organizing data in a data container including a plurality of records , each of said plurality of records including a plurality of fields , said method comprising the computer-implemented steps of : determining codes for corresponding records of said plurality records based on bit-interleaving values from at least two of said plurality of fields belonging to said corresponding records ;
creating a first data (first data) base object having a plurality of rows corresponding to said plurality of records and a first column (computing devices) for holding said codes for said corresponding records and a second column for holding a reference to said corresponding records ;
creating a second database object containing prefixes of said codes based on said first database object ;
and subdividing the data container into a plurality of subsets based on said second database object .

US6223182B1
CLAIM 6
. The method of claim 1 , wherein the step of subdividing the data container into a plurality of subsets includes the steps of : grouping entries of the second database objects into ‘p’ groups , wherein p is a square root (processing data, computer system) of a number of the entries of the second database ;
and separately joining each of the groups with the data container to form the subsets .

US8190610B2
CLAIM 18
. The computer system (square root) of claim 17 , wherein : the at least one output data group is a plurality of output data groups .
US6223182B1
CLAIM 6
. The method of claim 1 , wherein the step of subdividing the data container into a plurality of subsets includes the steps of : grouping entries of the second database objects into ‘p’ groups , wherein p is a square root (processing data, computer system) of a number of the entries of the second database ;
and separately joining each of the groups with the data container to form the subsets .

US8190610B2
CLAIM 19
. The computer system (square root) of claim 17 , wherein : corresponding intermediate data for a data group being identifiable to the data group includes the corresponding intermediate data being stored such that metadata associated with the manner of storage is identifiable to the data group .
US6223182B1
CLAIM 6
. The method of claim 1 , wherein the step of subdividing the data container into a plurality of subsets includes the steps of : grouping entries of the second database objects into ‘p’ groups , wherein p is a square root (processing data, computer system) of a number of the entries of the second database ;
and separately joining each of the groups with the data container to form the subsets .

US8190610B2
CLAIM 20
. The computer system (square root) of claim 19 , wherein : corresponding intermediate data for a data group being identifiable to the data group includes the computer system being configured to store the corresponding intermediate data in association with data that is identifiable to the data group .
US6223182B1
CLAIM 6
. The method of claim 1 , wherein the step of subdividing the data container into a plurality of subsets includes the steps of : grouping entries of the second database objects into ‘p’ groups , wherein p is a square root (processing data, computer system) of a number of the entries of the second database ;
and separately joining each of the groups with the data container to form the subsets .

US8190610B2
CLAIM 21
. The computer system (square root) of claim 17 , wherein : the at least one processor and memory being configured to process the intermediate data for each data group in a manner that is defined to correspond to that data group includes , for each data group , the at least one processor and memory being configured to employ an iterator that corresponds to that data group .
US6223182B1
CLAIM 6
. The method of claim 1 , wherein the step of subdividing the data container into a plurality of subsets includes the steps of : grouping entries of the second database objects into ‘p’ groups , wherein p is a square root (processing data, computer system) of a number of the entries of the second database ;
and separately joining each of the groups with the data container to form the subsets .

US8190610B2
CLAIM 22
. The computer system (square root) of claim 21 , wherein : the intermediate data includes a plurality of grouped sets of key/value pairs ;

the at least one processor and memory are further operable to reduce the intermediate data via a plurality of reducers ;

for at least one of the reducers , the iterator corresponding to a particular data group (more process) , for that reducer , is configured to operate according to a different key of a different schema than the iterator corresponding to another particular data group , for that reducer .
US6223182B1
CLAIM 6
. The method of claim 1 , wherein the step of subdividing the data container into a plurality of subsets includes the steps of : grouping entries of the second database objects into ‘p’ groups , wherein p is a square root (processing data, computer system) of a number of the entries of the second database ;
and separately joining each of the groups with the data container to form the subsets .

US6223182B1
CLAIM 10
. A computer-readable medium bearing instruction for organizing data in a data container including a plurality of records , each of said plurality of records including a plurality of fields , said instructions arranged for causing one or more process (particular data group) ors to perform the steps of : determining codes for corresponding records of said plurality records based on bit-interleaving values from at least two said plurality of fields belonging to said corresponding records ;
creating a first database object having a plurality of rows corresponding to said plurality of records and a first column for holding said codes for said corresponding records and a second column for holding a reference to said corresponding records ;
creating a second database object containing prefixes of said codes based on said first database object ;
and subdividing the data container into a plurality of subsets based on said second database object .

US8190610B2
CLAIM 23
. The computer system (square root) of claim 17 , wherein : the intermediate data includes a plurality of grouped sets of key/value pairs ;

the at least one processor and memory are further operable to reduce the intermediate data via a plurality of reducers ;

and the at least one processor and memory are further operable to : partition the intermediate data into a plurality of partitions , to provide each key/value pair of the intermediate data to a separate one of the partitions ;

and provide the intermediate data of each partition to a separate one of the reducers .
US6223182B1
CLAIM 6
. The method of claim 1 , wherein the step of subdividing the data container into a plurality of subsets includes the steps of : grouping entries of the second database objects into ‘p’ groups , wherein p is a square root (processing data, computer system) of a number of the entries of the second database ;
and separately joining each of the groups with the data container to form the subsets .

US8190610B2
CLAIM 24
. The computer system (square root) of claim 17 , wherein : the intermediate data includes a plurality of grouped sets of key/value pairs ;

the at least one processor and memory are further operable to reduce the intermediate data via a plurality of reducers ;

and the at least one processor and memory are further operable to : partition the intermediate data into a plurality of partitions , to provide at least some of the key/value pairs of the intermediate data to more than one of the partitions ;

and provide the intermediate data of each partition to a separate one of the reducers .
US6223182B1
CLAIM 6
. The method of claim 1 , wherein the step of subdividing the data container into a plurality of subsets includes the steps of : grouping entries of the second database objects into ‘p’ groups , wherein p is a square root (processing data, computer system) of a number of the entries of the second database ;
and separately joining each of the groups with the data container to form the subsets .

US8190610B2
CLAIM 25
. The computer system (square root) of claim 24 , wherein : the at least one processor and memory are further operable to , in partitioning , to provide all of the key/value pairs of the intermediate data to all of the partitions .
US6223182B1
CLAIM 6
. The method of claim 1 , wherein the step of subdividing the data container into a plurality of subsets includes the steps of : grouping entries of the second database objects into ‘p’ groups , wherein p is a square root (processing data, computer system) of a number of the entries of the second database ;
and separately joining each of the groups with the data container to form the subsets .

US8190610B2
CLAIM 26
. The computer system (square root) of claim 24 , wherein : at least some of the reducers include a sort , group-by-key and combine task ;

the at least one processor and memory are further operable to generate and provide metadata for at least some of the mapping , partitioning , combining , grouping and sorting .
US6223182B1
CLAIM 6
. The method of claim 1 , wherein the step of subdividing the data container into a plurality of subsets includes the steps of : grouping entries of the second database objects into ‘p’ groups , wherein p is a square root (processing data, computer system) of a number of the entries of the second database ;
and separately joining each of the groups with the data container to form the subsets .

US8190610B2
CLAIM 27
. The computer system (square root) of claim 26 , wherein : the reducing includes processing the metadata .
US6223182B1
CLAIM 6
. The method of claim 1 , wherein the step of subdividing the data container into a plurality of subsets includes the steps of : grouping entries of the second database objects into ‘p’ groups , wherein p is a square root (processing data, computer system) of a number of the entries of the second database ;
and separately joining each of the groups with the data container to form the subsets .

US8190610B2
CLAIM 28
. The computer system (square root) of claim 27 , wherein : the at least one processor and memory being further operable to process the intermediate data for each data group in a manner that is defined to correspond to that data group includes , for each data group , the at least one processor and memory being further operable to employ an iterator that corresponds to that data group , wherein the iterator is configured to provide the associated metadata to the reducing .
US6223182B1
CLAIM 6
. The method of claim 1 , wherein the step of subdividing the data container into a plurality of subsets includes the steps of : grouping entries of the second database objects into ‘p’ groups , wherein p is a square root (processing data, computer system) of a number of the entries of the second database ;
and separately joining each of the groups with the data container to form the subsets .

US8190610B2
CLAIM 29
. The computer system (square root) of claim 17 , wherein : the intermediate data processing of the reducing further comprises processing data (square root) that is not intermediate data .
US6223182B1
CLAIM 6
. The method of claim 1 , wherein the step of subdividing the data container into a plurality of subsets includes the steps of : grouping entries of the second database objects into ‘p’ groups , wherein p is a square root (processing data, computer system) of a number of the entries of the second database ;
and separately joining each of the groups with the data container to form the subsets .

US8190610B2
CLAIM 30
. The computer system (square root) of claim 29 , wherein : the at least one processor and memory are further operable to reduce the intermediate data via a plurality of reducers ;

and the data that is not intermediate data , for a particular reducer , includes data that is associated with another reducer .
US6223182B1
CLAIM 6
. The method of claim 1 , wherein the step of subdividing the data container into a plurality of subsets includes the steps of : grouping entries of the second database objects into ‘p’ groups , wherein p is a square root (processing data, computer system) of a number of the entries of the second database ;
and separately joining each of the groups with the data container to form the subsets .

US8190610B2
CLAIM 31
. The computer system (square root) of claim 29 , wherein : the at least one processor and memory are further operable to reduce the intermediate data via a plurality of reducers ;

and the data that is not intermediate data , for a particular reducer , includes data that is associated with that reducer .
US6223182B1
CLAIM 6
. The method of claim 1 , wherein the step of subdividing the data container into a plurality of subsets includes the steps of : grouping entries of the second database objects into ‘p’ groups , wherein p is a square root (processing data, computer system) of a number of the entries of the second database ;
and separately joining each of the groups with the data container to form the subsets .

US8190610B2
CLAIM 32
. The computer system (square root) of claim 17 , wherein the at least one processor and memory are further operable to reduce by relating the data among the plurality of data groups .
US6223182B1
CLAIM 6
. The method of claim 1 , wherein the step of subdividing the data container into a plurality of subsets includes the steps of : grouping entries of the second database objects into ‘p’ groups , wherein p is a square root (processing data, computer system) of a number of the entries of the second database ;
and separately joining each of the groups with the data container to form the subsets .

US8190610B2
CLAIM 33
. A map-reduce method of processing data (square root) from a plurality of groups having different schema over a computer system (square root) , the method comprising : for a first data (first data) set having a plurality of first key-value pairs , wherein such first data set belongs to a first data group and the first key-value pairs have a first schema , partitioning the first data set into a plurality of data partitions and providing each data partition to a selected one of a plurality of mapping functions that are each user-configurable to independently output a plurality of lists of values for each of a set of keys found in such map function' ;

s corresponding data partition to form a first intermediate data set having a first set of resulting key-value pairs ;

for a second data set having a plurality of second key-value pairs , wherein such second data set belongs to a second data group and the second key-value pairs have a second schema , partitioning the first data set into a plurality of data partitions and providing each data partition to a selected one of a plurality of mapping functions to form a second intermediate data set having a second set of resulting key-value pairs differing from the first set of resulting key-value pairs , wherein the first schema differs from the second schema , wherein the first and second schema and the first and second set of resulting key-value pairs have a key in common ;

and reducing the first and the second intermediate data set together so as to form an output data set , wherein the reducing is accomplished by iterating on at least one key from the first and second intermediate data set , and the output data set has a different schema than the first and second schema , wherein the mapping and reducing operations are performed by a distributed system .
US6223182B1
CLAIM 1
. A method of organizing data in a data container including a plurality of records , each of said plurality of records including a plurality of fields , said method comprising the computer-implemented steps of : determining codes for corresponding records of said plurality records based on bit-interleaving values from at least two of said plurality of fields belonging to said corresponding records ;
creating a first data (first data) base object having a plurality of rows corresponding to said plurality of records and a first column for holding said codes for said corresponding records and a second column for holding a reference to said corresponding records ;
creating a second database object containing prefixes of said codes based on said first database object ;
and subdividing the data container into a plurality of subsets based on said second database object .

US6223182B1
CLAIM 6
. The method of claim 1 , wherein the step of subdividing the data container into a plurality of subsets includes the steps of : grouping entries of the second database objects into ‘p’ groups , wherein p is a square root (processing data, computer system) of a number of the entries of the second database ;
and separately joining each of the groups with the data container to form the subsets .

US8190610B2
CLAIM 40
. A computer system (square root) including a plurality of computing devices (first column) , the computer system configured to process data of a data set , wherein the data set comprises a plurality of data groups , the computer system comprises at least one processor and memory that are operable to perform the following operations : for a first data (first data) set having a plurality of first key-value pairs , wherein such first data set belongs to a first data group and the first key-value pairs have a first schema , partitioning the first data set into a plurality of data partitions and providing each data partition to a selected one of a plurality of mapping functions that are each user-configurable to independently output a plurality of lists of values for each of a set of keys found in such map function' ;

s corresponding data partition to form a first intermediate data set having a first set of resulting key-value pairs ;

for a second data set having a plurality of second key-value pairs , wherein such second data set belongs to a second data group and the second key-value pairs have a second schema , partitioning the first data set into a plurality of data partitions and providing each data partition to a selected one of a plurality of mapping functions to form a second intermediate data set having a second set of resulting key-value pairs differing from the first set of resulting key-value pairs , wherein the first schema differs from the second schema ;

and reducing the first and the second intermediate data set together so as to form an output data set , wherein the reducing is accomplished by iterating on at least one key from the first and second intermediate data set , and the output data set has a different schema than the first and second schema , wherein the first and second schema and the first and second set of resulting key-value pairs have a key in common .
US6223182B1
CLAIM 1
. A method of organizing data in a data container including a plurality of records , each of said plurality of records including a plurality of fields , said method comprising the computer-implemented steps of : determining codes for corresponding records of said plurality records based on bit-interleaving values from at least two of said plurality of fields belonging to said corresponding records ;
creating a first data (first data) base object having a plurality of rows corresponding to said plurality of records and a first column (computing devices) for holding said codes for said corresponding records and a second column for holding a reference to said corresponding records ;
creating a second database object containing prefixes of said codes based on said first database object ;
and subdividing the data container into a plurality of subsets based on said second database object .

US6223182B1
CLAIM 6
. The method of claim 1 , wherein the step of subdividing the data container into a plurality of subsets includes the steps of : grouping entries of the second database objects into ‘p’ groups , wherein p is a square root (processing data, computer system) of a number of the entries of the second database ;
and separately joining each of the groups with the data container to form the subsets .

US8190610B2
CLAIM 41
. The computer system (square root) of claim 40 , wherein the reducing is accomplished by iterating on at least one key from each of the first and second intermediate data set so that the output data set is a merging of a portion of the first and second intermediate data set .
US6223182B1
CLAIM 6
. The method of claim 1 , wherein the step of subdividing the data container into a plurality of subsets includes the steps of : grouping entries of the second database objects into ‘p’ groups , wherein p is a square root (processing data, computer system) of a number of the entries of the second database ;
and separately joining each of the groups with the data container to form the subsets .

US8190610B2
CLAIM 42
. The computer system (square root) of claim 40 , wherein the reducing is accomplished by a plurality of reducers , the d the at least one processor and memory further operable for : partitioning the first or second intermediate data set into a plurality of partitions , at least some of the key-value pairs of the first or second intermediate data being provided to more than one of the partitions ;

and providing the partitioned intermediate data set of each partition to a separate one of the reducers .
US6223182B1
CLAIM 6
. The method of claim 1 , wherein the step of subdividing the data container into a plurality of subsets includes the steps of : grouping entries of the second database objects into ‘p’ groups , wherein p is a square root (processing data, computer system) of a number of the entries of the second database ;
and separately joining each of the groups with the data container to form the subsets .

US8190610B2
CLAIM 43
. The computer system (square root) of claim 42 , wherein all of the key-value pairs of the first and second intermediate data set are provided to all of the reducers .
US6223182B1
CLAIM 6
. The method of claim 1 , wherein the step of subdividing the data container into a plurality of subsets includes the steps of : grouping entries of the second database objects into ‘p’ groups , wherein p is a square root (processing data, computer system) of a number of the entries of the second database ;
and separately joining each of the groups with the data container to form the subsets .

US8190610B2
CLAIM 44
. The computer system (square root) of claim 42 , wherein at least some of the reducers include a sort , group-by-key and combine task , the d the at least one processor and memory are further operable for : generating and providing metadata for at least some of the mapping , partitioning , combining , grouping and sorting .
US6223182B1
CLAIM 6
. The method of claim 1 , wherein the step of subdividing the data container into a plurality of subsets includes the steps of : grouping entries of the second database objects into ‘p’ groups , wherein p is a square root (processing data, computer system) of a number of the entries of the second database ;
and separately joining each of the groups with the data container to form the subsets .

US8190610B2
CLAIM 45
. The computer system (square root) of claim 44 , wherein the reducing includes processing the metadata .
US6223182B1
CLAIM 6
. The method of claim 1 , wherein the step of subdividing the data container into a plurality of subsets includes the steps of : grouping entries of the second database objects into ‘p’ groups , wherein p is a square root (processing data, computer system) of a number of the entries of the second database ;
and separately joining each of the groups with the data container to form the subsets .

US8190610B2
CLAIM 46
. The computer system (square root) of claim 45 , wherein iterating includes providing the associated metadata to the processing of the reducing step .
US6223182B1
CLAIM 6
. The method of claim 1 , wherein the step of subdividing the data container into a plurality of subsets includes the steps of : grouping entries of the second database objects into ‘p’ groups , wherein p is a square root (processing data, computer system) of a number of the entries of the second database ;
and separately joining each of the groups with the data container to form the subsets .




US8190610B2

Filed: 2006-10-05     Issued: 2012-05-29

MapReduce for distributed database processing

(Original Assignee) Yahoo Inc     (Current Assignee) R2 Solutions LLC ; Altaba Inc

Ali Dasdan, Hung-Chih Yang, Ruey-Lung Hsiao
US6167405A

Filed: 1998-04-27     Issued: 2000-12-26

Method and apparatus for automatically populating a data warehouse system

(Original Assignee) Bull HN Information Systems Inc     (Current Assignee) Bull HN Information Systems Inc

Kenneth R. Rosensteel, Jr., Jerry T Guhr, Joseph K. Picone
US8190610B2
CLAIM 1
. A method of processing data of a data set over a distributed system , wherein the data set comprises a plurality of data groups , the method comprising : partitioning the data of each one of the data groups into a plurality of data partitions (data replication) that each have a plurality of key-value pairs and providing each data partition (information representative, first control) to a selected one of a plurality of mapping functions (corresponding port) that are each user-configurable to independently output a plurality of lists of values for each of a set of keys found in such map function' ;

s corresponding data partition to form corresponding intermediate data for that data group (information representative, first control) and identifiable to that data group , wherein the data of a first data group has a different schema than the data of a second data group and the data of the first data group is mapped differently than the data of the second data group so that different lists of values are output for the corresponding different intermediate data , wherein the different schema and corresponding different intermediate data have a key in common ;

and reducing the intermediate data for the data groups to at least one output data group , including processing the intermediate data for each data group in a manner that is defined to correspond to that data group , so as to result in a merging of the corresponding different intermediate data based on the key in common , wherein the mapping and reducing operations are performed by a distributed system .
US6167405A
CLAIM 1
. A method for facilitating the creation of data warehouse requests for populating data warehouse tables defining a particular warehouse design in a data warehouse system comprised of a number of data source systems and a target system , a repository component for storing information representative (data partition, data group, second data group, s corresponding data partition to form corresponding intermediate data, s corresponding data partition) of the warehouse design and a warehouse management interface component operatively coupled to the repository for enabling development of warehouse requests required to populate the warehouse tables , each warehouse request having a plurality of subcomponents specifying a process of extracting data from source tables of a source database located in one of the data source systems , moving the data to the target system , transforming the data to match target system requirements and then storing the data into a target database of the target system , the method comprising the steps of : (a) during the design phase , generating and storing in the repository , information defining reference links between each target data warehouse table and the source tables from which instances must be extracted , identification of the source databases and target database , reference links between corresponding port (mapping functions) ions of the source and target tables , and identification of those warehouse request entities related to a number of target tables to be populated by a particular warehouse request ;
(b) upon completion of the design phase , invoking a data replication (data partitions) management (DRM) component included within the warehouse management interface component in response to a selection of a warehouse request to be implemented ;
(c) in response to the selection , automatically creating the different subcomponents of a data warehouse request by the DRM component accessing the previously created reference links from the repository and displaying a visual representation of the subcomponents of the request ;
and , (d) providing access to menus of a menu facility for enabling visualization of the automatically created data warehouse request and for making any required modifications to information related to the each of the subcomponents selected for display prior to scheduling the request for execution .

US6167405A
CLAIM 25
. The apparatus of claim 24 wherein the DRM component menu facility further includes : (c) first control (data partition, data group, second data group, s corresponding data partition to form corresponding intermediate data, s corresponding data partition) means operative in response to being supplied properties pertaining to a particular subcomponent object , to signal when the subcomponent object or event has been configured by the DRM component .

US8190610B2
CLAIM 2
. The method of claim 1 , wherein : the at least one output data group (information representative, first control) is a plurality of output data groups .
US6167405A
CLAIM 1
. A method for facilitating the creation of data warehouse requests for populating data warehouse tables defining a particular warehouse design in a data warehouse system comprised of a number of data source systems and a target system , a repository component for storing information representative (data partition, data group, second data group, s corresponding data partition to form corresponding intermediate data, s corresponding data partition) of the warehouse design and a warehouse management interface component operatively coupled to the repository for enabling development of warehouse requests required to populate the warehouse tables , each warehouse request having a plurality of subcomponents specifying a process of extracting data from source tables of a source database located in one of the data source systems , moving the data to the target system , transforming the data to match target system requirements and then storing the data into a target database of the target system , the method comprising the steps of : (a) during the design phase , generating and storing in the repository , information defining reference links between each target data warehouse table and the source tables from which instances must be extracted , identification of the source databases and target database , reference links between corresponding portions of the source and target tables , and identification of those warehouse request entities related to a number of target tables to be populated by a particular warehouse request ;
(b) upon completion of the design phase , invoking a data replication management (DRM) component included within the warehouse management interface component in response to a selection of a warehouse request to be implemented ;
(c) in response to the selection , automatically creating the different subcomponents of a data warehouse request by the DRM component accessing the previously created reference links from the repository and displaying a visual representation of the subcomponents of the request ;
and , (d) providing access to menus of a menu facility for enabling visualization of the automatically created data warehouse request and for making any required modifications to information related to the each of the subcomponents selected for display prior to scheduling the request for execution .

US6167405A
CLAIM 25
. The apparatus of claim 24 wherein the DRM component menu facility further includes : (c) first control (data partition, data group, second data group, s corresponding data partition to form corresponding intermediate data, s corresponding data partition) means operative in response to being supplied properties pertaining to a particular subcomponent object , to signal when the subcomponent object or event has been configured by the DRM component .

US8190610B2
CLAIM 3
. The method of claim 1 , wherein : corresponding intermediate data for a data group (information representative, first control) being identifiable to the data group includes the corresponding intermediate data being stored such that metadata associated with a manner of storage is identifiable to the data group .
US6167405A
CLAIM 1
. A method for facilitating the creation of data warehouse requests for populating data warehouse tables defining a particular warehouse design in a data warehouse system comprised of a number of data source systems and a target system , a repository component for storing information representative (data partition, data group, second data group, s corresponding data partition to form corresponding intermediate data, s corresponding data partition) of the warehouse design and a warehouse management interface component operatively coupled to the repository for enabling development of warehouse requests required to populate the warehouse tables , each warehouse request having a plurality of subcomponents specifying a process of extracting data from source tables of a source database located in one of the data source systems , moving the data to the target system , transforming the data to match target system requirements and then storing the data into a target database of the target system , the method comprising the steps of : (a) during the design phase , generating and storing in the repository , information defining reference links between each target data warehouse table and the source tables from which instances must be extracted , identification of the source databases and target database , reference links between corresponding portions of the source and target tables , and identification of those warehouse request entities related to a number of target tables to be populated by a particular warehouse request ;
(b) upon completion of the design phase , invoking a data replication management (DRM) component included within the warehouse management interface component in response to a selection of a warehouse request to be implemented ;
(c) in response to the selection , automatically creating the different subcomponents of a data warehouse request by the DRM component accessing the previously created reference links from the repository and displaying a visual representation of the subcomponents of the request ;
and , (d) providing access to menus of a menu facility for enabling visualization of the automatically created data warehouse request and for making any required modifications to information related to the each of the subcomponents selected for display prior to scheduling the request for execution .

US6167405A
CLAIM 25
. The apparatus of claim 24 wherein the DRM component menu facility further includes : (c) first control (data partition, data group, second data group, s corresponding data partition to form corresponding intermediate data, s corresponding data partition) means operative in response to being supplied properties pertaining to a particular subcomponent object , to signal when the subcomponent object or event has been configured by the DRM component .

US8190610B2
CLAIM 4
. The method of claim 1 , wherein : corresponding intermediate data for a data group (information representative, first control) being identifiable to the data group includes the corresponding intermediate data being stored in association with data that is identifiable to the data group .
US6167405A
CLAIM 1
. A method for facilitating the creation of data warehouse requests for populating data warehouse tables defining a particular warehouse design in a data warehouse system comprised of a number of data source systems and a target system , a repository component for storing information representative (data partition, data group, second data group, s corresponding data partition to form corresponding intermediate data, s corresponding data partition) of the warehouse design and a warehouse management interface component operatively coupled to the repository for enabling development of warehouse requests required to populate the warehouse tables , each warehouse request having a plurality of subcomponents specifying a process of extracting data from source tables of a source database located in one of the data source systems , moving the data to the target system , transforming the data to match target system requirements and then storing the data into a target database of the target system , the method comprising the steps of : (a) during the design phase , generating and storing in the repository , information defining reference links between each target data warehouse table and the source tables from which instances must be extracted , identification of the source databases and target database , reference links between corresponding portions of the source and target tables , and identification of those warehouse request entities related to a number of target tables to be populated by a particular warehouse request ;
(b) upon completion of the design phase , invoking a data replication management (DRM) component included within the warehouse management interface component in response to a selection of a warehouse request to be implemented ;
(c) in response to the selection , automatically creating the different subcomponents of a data warehouse request by the DRM component accessing the previously created reference links from the repository and displaying a visual representation of the subcomponents of the request ;
and , (d) providing access to menus of a menu facility for enabling visualization of the automatically created data warehouse request and for making any required modifications to information related to the each of the subcomponents selected for display prior to scheduling the request for execution .

US6167405A
CLAIM 25
. The apparatus of claim 24 wherein the DRM component menu facility further includes : (c) first control (data partition, data group, second data group, s corresponding data partition to form corresponding intermediate data, s corresponding data partition) means operative in response to being supplied properties pertaining to a particular subcomponent object , to signal when the subcomponent object or event has been configured by the DRM component .

US8190610B2
CLAIM 5
. The method of claim 1 , wherein : processing the intermediate data for each data group (information representative, first control) in a manner that is defined to correspond to that data group includes , for each data group , employing an iterator that corresponds to that data group .
US6167405A
CLAIM 1
. A method for facilitating the creation of data warehouse requests for populating data warehouse tables defining a particular warehouse design in a data warehouse system comprised of a number of data source systems and a target system , a repository component for storing information representative (data partition, data group, second data group, s corresponding data partition to form corresponding intermediate data, s corresponding data partition) of the warehouse design and a warehouse management interface component operatively coupled to the repository for enabling development of warehouse requests required to populate the warehouse tables , each warehouse request having a plurality of subcomponents specifying a process of extracting data from source tables of a source database located in one of the data source systems , moving the data to the target system , transforming the data to match target system requirements and then storing the data into a target database of the target system , the method comprising the steps of : (a) during the design phase , generating and storing in the repository , information defining reference links between each target data warehouse table and the source tables from which instances must be extracted , identification of the source databases and target database , reference links between corresponding portions of the source and target tables , and identification of those warehouse request entities related to a number of target tables to be populated by a particular warehouse request ;
(b) upon completion of the design phase , invoking a data replication management (DRM) component included within the warehouse management interface component in response to a selection of a warehouse request to be implemented ;
(c) in response to the selection , automatically creating the different subcomponents of a data warehouse request by the DRM component accessing the previously created reference links from the repository and displaying a visual representation of the subcomponents of the request ;
and , (d) providing access to menus of a menu facility for enabling visualization of the automatically created data warehouse request and for making any required modifications to information related to the each of the subcomponents selected for display prior to scheduling the request for execution .

US6167405A
CLAIM 25
. The apparatus of claim 24 wherein the DRM component menu facility further includes : (c) first control (data partition, data group, second data group, s corresponding data partition to form corresponding intermediate data, s corresponding data partition) means operative in response to being supplied properties pertaining to a particular subcomponent object , to signal when the subcomponent object or event has been configured by the DRM component .

US8190610B2
CLAIM 11
. The method of claim 10 , wherein : processing the intermediate data for each data group (information representative, first control) in a manner that is defined to correspond to that data group includes , for each data group , employing an iterator that corresponds to that data group , wherein the iterator includes providing the associated metadata to the processing of the reducing step .
US6167405A
CLAIM 1
. A method for facilitating the creation of data warehouse requests for populating data warehouse tables defining a particular warehouse design in a data warehouse system comprised of a number of data source systems and a target system , a repository component for storing information representative (data partition, data group, second data group, s corresponding data partition to form corresponding intermediate data, s corresponding data partition) of the warehouse design and a warehouse management interface component operatively coupled to the repository for enabling development of warehouse requests required to populate the warehouse tables , each warehouse request having a plurality of subcomponents specifying a process of extracting data from source tables of a source database located in one of the data source systems , moving the data to the target system , transforming the data to match target system requirements and then storing the data into a target database of the target system , the method comprising the steps of : (a) during the design phase , generating and storing in the repository , information defining reference links between each target data warehouse table and the source tables from which instances must be extracted , identification of the source databases and target database , reference links between corresponding portions of the source and target tables , and identification of those warehouse request entities related to a number of target tables to be populated by a particular warehouse request ;
(b) upon completion of the design phase , invoking a data replication management (DRM) component included within the warehouse management interface component in response to a selection of a warehouse request to be implemented ;
(c) in response to the selection , automatically creating the different subcomponents of a data warehouse request by the DRM component accessing the previously created reference links from the repository and displaying a visual representation of the subcomponents of the request ;
and , (d) providing access to menus of a menu facility for enabling visualization of the automatically created data warehouse request and for making any required modifications to information related to the each of the subcomponents selected for display prior to scheduling the request for execution .

US6167405A
CLAIM 25
. The apparatus of claim 24 wherein the DRM component menu facility further includes : (c) first control (data partition, data group, second data group, s corresponding data partition to form corresponding intermediate data, s corresponding data partition) means operative in response to being supplied properties pertaining to a particular subcomponent object , to signal when the subcomponent object or event has been configured by the DRM component .

US8190610B2
CLAIM 12
. The method of claim 5 , wherein : the intermediate data includes a plurality of grouped sets of key/value pairs ;

the reducing step is carried out by a plurality of reducers ;

for at least one of the reducers , the iterator corresponding to a particular data group (information representative, first control) , for that reducer , operates according to a different key of a different schema than the iterator corresponding to another particular data group , for that reducer .
US6167405A
CLAIM 1
. A method for facilitating the creation of data warehouse requests for populating data warehouse tables defining a particular warehouse design in a data warehouse system comprised of a number of data source systems and a target system , a repository component for storing information representative (data partition, data group, second data group, s corresponding data partition to form corresponding intermediate data, s corresponding data partition) of the warehouse design and a warehouse management interface component operatively coupled to the repository for enabling development of warehouse requests required to populate the warehouse tables , each warehouse request having a plurality of subcomponents specifying a process of extracting data from source tables of a source database located in one of the data source systems , moving the data to the target system , transforming the data to match target system requirements and then storing the data into a target database of the target system , the method comprising the steps of : (a) during the design phase , generating and storing in the repository , information defining reference links between each target data warehouse table and the source tables from which instances must be extracted , identification of the source databases and target database , reference links between corresponding portions of the source and target tables , and identification of those warehouse request entities related to a number of target tables to be populated by a particular warehouse request ;
(b) upon completion of the design phase , invoking a data replication management (DRM) component included within the warehouse management interface component in response to a selection of a warehouse request to be implemented ;
(c) in response to the selection , automatically creating the different subcomponents of a data warehouse request by the DRM component accessing the previously created reference links from the repository and displaying a visual representation of the subcomponents of the request ;
and , (d) providing access to menus of a menu facility for enabling visualization of the automatically created data warehouse request and for making any required modifications to information related to the each of the subcomponents selected for display prior to scheduling the request for execution .

US6167405A
CLAIM 4
. The method of claim 1 wherein step (a) further includes providing extensions in the repository for creating and storing : objects identifying the data source entities (particular data group) and source data table entities and attributes from which information is to be extracted , objects identifying each target warehouse table entity and attributes and objects identifying the warehouse request entities that are to perform the extraction ;
and , links between the data source and source data table objects defining the references between the data source and source table entities , links between the source data table objects and the target table objects defining the references between corresponding portions of the source data and target table entities and links between each target table object and the particular warehouse request object for identifying those target tables to be populated by the warehouse request .

US6167405A
CLAIM 25
. The apparatus of claim 24 wherein the DRM component menu facility further includes : (c) first control (data partition, data group, second data group, s corresponding data partition to form corresponding intermediate data, s corresponding data partition) means operative in response to being supplied properties pertaining to a particular subcomponent object , to signal when the subcomponent object or event has been configured by the DRM component .

US8190610B2
CLAIM 17
. A computer system including a plurality of computing devices , the computer system configured to process data of a data set , wherein the data set comprises a plurality of data groups , the computer system comprises at least one processor and memory that are operable to perform the following operations : partitioning the data of each one of the data groups into a plurality of data partitions (data replication) that each have a plurality of key-value pairs and providing each data partition (information representative, first control) to a selected one of a plurality of mapping functions (corresponding port) that are each user-configurable to independently output a plurality of lists of values for each of a set of keys found in such map function' ;

s corresponding data partition to form corresponding intermediate data for that data group (information representative, first control) and identifiable to that data group , wherein the data of a first data group has a different schema than the data of a second data group and the data of the first data group is mapped differently than the data of the second data group so that different lists of values are output for the corresponding different intermediate data , wherein the different schema and corresponding different intermediate data have a key in common ;

and reduce the intermediate data for the data groups to at least one output data group , including processing the intermediate data for each data group in a manner that is defined to correspond to that data group so as to result in a merging of the corresponding different intermediate data based on the key in common .
US6167405A
CLAIM 1
. A method for facilitating the creation of data warehouse requests for populating data warehouse tables defining a particular warehouse design in a data warehouse system comprised of a number of data source systems and a target system , a repository component for storing information representative (data partition, data group, second data group, s corresponding data partition to form corresponding intermediate data, s corresponding data partition) of the warehouse design and a warehouse management interface component operatively coupled to the repository for enabling development of warehouse requests required to populate the warehouse tables , each warehouse request having a plurality of subcomponents specifying a process of extracting data from source tables of a source database located in one of the data source systems , moving the data to the target system , transforming the data to match target system requirements and then storing the data into a target database of the target system , the method comprising the steps of : (a) during the design phase , generating and storing in the repository , information defining reference links between each target data warehouse table and the source tables from which instances must be extracted , identification of the source databases and target database , reference links between corresponding port (mapping functions) ions of the source and target tables , and identification of those warehouse request entities related to a number of target tables to be populated by a particular warehouse request ;
(b) upon completion of the design phase , invoking a data replication (data partitions) management (DRM) component included within the warehouse management interface component in response to a selection of a warehouse request to be implemented ;
(c) in response to the selection , automatically creating the different subcomponents of a data warehouse request by the DRM component accessing the previously created reference links from the repository and displaying a visual representation of the subcomponents of the request ;
and , (d) providing access to menus of a menu facility for enabling visualization of the automatically created data warehouse request and for making any required modifications to information related to the each of the subcomponents selected for display prior to scheduling the request for execution .

US6167405A
CLAIM 25
. The apparatus of claim 24 wherein the DRM component menu facility further includes : (c) first control (data partition, data group, second data group, s corresponding data partition to form corresponding intermediate data, s corresponding data partition) means operative in response to being supplied properties pertaining to a particular subcomponent object , to signal when the subcomponent object or event has been configured by the DRM component .

US8190610B2
CLAIM 18
. The computer system of claim 17 , wherein : the at least one output data group (information representative, first control) is a plurality of output data groups .
US6167405A
CLAIM 1
. A method for facilitating the creation of data warehouse requests for populating data warehouse tables defining a particular warehouse design in a data warehouse system comprised of a number of data source systems and a target system , a repository component for storing information representative (data partition, data group, second data group, s corresponding data partition to form corresponding intermediate data, s corresponding data partition) of the warehouse design and a warehouse management interface component operatively coupled to the repository for enabling development of warehouse requests required to populate the warehouse tables , each warehouse request having a plurality of subcomponents specifying a process of extracting data from source tables of a source database located in one of the data source systems , moving the data to the target system , transforming the data to match target system requirements and then storing the data into a target database of the target system , the method comprising the steps of : (a) during the design phase , generating and storing in the repository , information defining reference links between each target data warehouse table and the source tables from which instances must be extracted , identification of the source databases and target database , reference links between corresponding portions of the source and target tables , and identification of those warehouse request entities related to a number of target tables to be populated by a particular warehouse request ;
(b) upon completion of the design phase , invoking a data replication management (DRM) component included within the warehouse management interface component in response to a selection of a warehouse request to be implemented ;
(c) in response to the selection , automatically creating the different subcomponents of a data warehouse request by the DRM component accessing the previously created reference links from the repository and displaying a visual representation of the subcomponents of the request ;
and , (d) providing access to menus of a menu facility for enabling visualization of the automatically created data warehouse request and for making any required modifications to information related to the each of the subcomponents selected for display prior to scheduling the request for execution .

US6167405A
CLAIM 25
. The apparatus of claim 24 wherein the DRM component menu facility further includes : (c) first control (data partition, data group, second data group, s corresponding data partition to form corresponding intermediate data, s corresponding data partition) means operative in response to being supplied properties pertaining to a particular subcomponent object , to signal when the subcomponent object or event has been configured by the DRM component .

US8190610B2
CLAIM 19
. The computer system of claim 17 , wherein : corresponding intermediate data for a data group (information representative, first control) being identifiable to the data group includes the corresponding intermediate data being stored such that metadata associated with the manner of storage is identifiable to the data group .
US6167405A
CLAIM 1
. A method for facilitating the creation of data warehouse requests for populating data warehouse tables defining a particular warehouse design in a data warehouse system comprised of a number of data source systems and a target system , a repository component for storing information representative (data partition, data group, second data group, s corresponding data partition to form corresponding intermediate data, s corresponding data partition) of the warehouse design and a warehouse management interface component operatively coupled to the repository for enabling development of warehouse requests required to populate the warehouse tables , each warehouse request having a plurality of subcomponents specifying a process of extracting data from source tables of a source database located in one of the data source systems , moving the data to the target system , transforming the data to match target system requirements and then storing the data into a target database of the target system , the method comprising the steps of : (a) during the design phase , generating and storing in the repository , information defining reference links between each target data warehouse table and the source tables from which instances must be extracted , identification of the source databases and target database , reference links between corresponding portions of the source and target tables , and identification of those warehouse request entities related to a number of target tables to be populated by a particular warehouse request ;
(b) upon completion of the design phase , invoking a data replication management (DRM) component included within the warehouse management interface component in response to a selection of a warehouse request to be implemented ;
(c) in response to the selection , automatically creating the different subcomponents of a data warehouse request by the DRM component accessing the previously created reference links from the repository and displaying a visual representation of the subcomponents of the request ;
and , (d) providing access to menus of a menu facility for enabling visualization of the automatically created data warehouse request and for making any required modifications to information related to the each of the subcomponents selected for display prior to scheduling the request for execution .

US6167405A
CLAIM 25
. The apparatus of claim 24 wherein the DRM component menu facility further includes : (c) first control (data partition, data group, second data group, s corresponding data partition to form corresponding intermediate data, s corresponding data partition) means operative in response to being supplied properties pertaining to a particular subcomponent object , to signal when the subcomponent object or event has been configured by the DRM component .

US8190610B2
CLAIM 20
. The computer system of claim 19 , wherein : corresponding intermediate data for a data group (information representative, first control) being identifiable to the data group includes the computer system being configured to store the corresponding intermediate data in association with data that is identifiable to the data group .
US6167405A
CLAIM 1
. A method for facilitating the creation of data warehouse requests for populating data warehouse tables defining a particular warehouse design in a data warehouse system comprised of a number of data source systems and a target system , a repository component for storing information representative (data partition, data group, second data group, s corresponding data partition to form corresponding intermediate data, s corresponding data partition) of the warehouse design and a warehouse management interface component operatively coupled to the repository for enabling development of warehouse requests required to populate the warehouse tables , each warehouse request having a plurality of subcomponents specifying a process of extracting data from source tables of a source database located in one of the data source systems , moving the data to the target system , transforming the data to match target system requirements and then storing the data into a target database of the target system , the method comprising the steps of : (a) during the design phase , generating and storing in the repository , information defining reference links between each target data warehouse table and the source tables from which instances must be extracted , identification of the source databases and target database , reference links between corresponding portions of the source and target tables , and identification of those warehouse request entities related to a number of target tables to be populated by a particular warehouse request ;
(b) upon completion of the design phase , invoking a data replication management (DRM) component included within the warehouse management interface component in response to a selection of a warehouse request to be implemented ;
(c) in response to the selection , automatically creating the different subcomponents of a data warehouse request by the DRM component accessing the previously created reference links from the repository and displaying a visual representation of the subcomponents of the request ;
and , (d) providing access to menus of a menu facility for enabling visualization of the automatically created data warehouse request and for making any required modifications to information related to the each of the subcomponents selected for display prior to scheduling the request for execution .

US6167405A
CLAIM 25
. The apparatus of claim 24 wherein the DRM component menu facility further includes : (c) first control (data partition, data group, second data group, s corresponding data partition to form corresponding intermediate data, s corresponding data partition) means operative in response to being supplied properties pertaining to a particular subcomponent object , to signal when the subcomponent object or event has been configured by the DRM component .

US8190610B2
CLAIM 21
. The computer system of claim 17 , wherein : the at least one processor and memory being configured to process the intermediate data for each data group (information representative, first control) in a manner that is defined to correspond to that data group includes , for each data group , the at least one processor and memory being configured to employ an iterator that corresponds to that data group .
US6167405A
CLAIM 1
. A method for facilitating the creation of data warehouse requests for populating data warehouse tables defining a particular warehouse design in a data warehouse system comprised of a number of data source systems and a target system , a repository component for storing information representative (data partition, data group, second data group, s corresponding data partition to form corresponding intermediate data, s corresponding data partition) of the warehouse design and a warehouse management interface component operatively coupled to the repository for enabling development of warehouse requests required to populate the warehouse tables , each warehouse request having a plurality of subcomponents specifying a process of extracting data from source tables of a source database located in one of the data source systems , moving the data to the target system , transforming the data to match target system requirements and then storing the data into a target database of the target system , the method comprising the steps of : (a) during the design phase , generating and storing in the repository , information defining reference links between each target data warehouse table and the source tables from which instances must be extracted , identification of the source databases and target database , reference links between corresponding portions of the source and target tables , and identification of those warehouse request entities related to a number of target tables to be populated by a particular warehouse request ;
(b) upon completion of the design phase , invoking a data replication management (DRM) component included within the warehouse management interface component in response to a selection of a warehouse request to be implemented ;
(c) in response to the selection , automatically creating the different subcomponents of a data warehouse request by the DRM component accessing the previously created reference links from the repository and displaying a visual representation of the subcomponents of the request ;
and , (d) providing access to menus of a menu facility for enabling visualization of the automatically created data warehouse request and for making any required modifications to information related to the each of the subcomponents selected for display prior to scheduling the request for execution .

US6167405A
CLAIM 25
. The apparatus of claim 24 wherein the DRM component menu facility further includes : (c) first control (data partition, data group, second data group, s corresponding data partition to form corresponding intermediate data, s corresponding data partition) means operative in response to being supplied properties pertaining to a particular subcomponent object , to signal when the subcomponent object or event has been configured by the DRM component .

US8190610B2
CLAIM 22
. The computer system of claim 21 , wherein : the intermediate data includes a plurality of grouped sets of key/value pairs ;

the at least one processor and memory are further operable to reduce the intermediate data via a plurality of reducers ;

for at least one of the reducers , the iterator corresponding to a particular data group (information representative, first control) , for that reducer , is configured to operate according to a different key of a different schema than the iterator corresponding to another particular data group , for that reducer .
US6167405A
CLAIM 1
. A method for facilitating the creation of data warehouse requests for populating data warehouse tables defining a particular warehouse design in a data warehouse system comprised of a number of data source systems and a target system , a repository component for storing information representative (data partition, data group, second data group, s corresponding data partition to form corresponding intermediate data, s corresponding data partition) of the warehouse design and a warehouse management interface component operatively coupled to the repository for enabling development of warehouse requests required to populate the warehouse tables , each warehouse request having a plurality of subcomponents specifying a process of extracting data from source tables of a source database located in one of the data source systems , moving the data to the target system , transforming the data to match target system requirements and then storing the data into a target database of the target system , the method comprising the steps of : (a) during the design phase , generating and storing in the repository , information defining reference links between each target data warehouse table and the source tables from which instances must be extracted , identification of the source databases and target database , reference links between corresponding portions of the source and target tables , and identification of those warehouse request entities related to a number of target tables to be populated by a particular warehouse request ;
(b) upon completion of the design phase , invoking a data replication management (DRM) component included within the warehouse management interface component in response to a selection of a warehouse request to be implemented ;
(c) in response to the selection , automatically creating the different subcomponents of a data warehouse request by the DRM component accessing the previously created reference links from the repository and displaying a visual representation of the subcomponents of the request ;
and , (d) providing access to menus of a menu facility for enabling visualization of the automatically created data warehouse request and for making any required modifications to information related to the each of the subcomponents selected for display prior to scheduling the request for execution .

US6167405A
CLAIM 4
. The method of claim 1 wherein step (a) further includes providing extensions in the repository for creating and storing : objects identifying the data source entities (particular data group) and source data table entities and attributes from which information is to be extracted , objects identifying each target warehouse table entity and attributes and objects identifying the warehouse request entities that are to perform the extraction ;
and , links between the data source and source data table objects defining the references between the data source and source table entities , links between the source data table objects and the target table objects defining the references between corresponding portions of the source data and target table entities and links between each target table object and the particular warehouse request object for identifying those target tables to be populated by the warehouse request .

US6167405A
CLAIM 25
. The apparatus of claim 24 wherein the DRM component menu facility further includes : (c) first control (data partition, data group, second data group, s corresponding data partition to form corresponding intermediate data, s corresponding data partition) means operative in response to being supplied properties pertaining to a particular subcomponent object , to signal when the subcomponent object or event has been configured by the DRM component .

US8190610B2
CLAIM 28
. The computer system of claim 27 , wherein : the at least one processor and memory being further operable to process the intermediate data for each data group (information representative, first control) in a manner that is defined to correspond to that data group includes , for each data group , the at least one processor and memory being further operable to employ an iterator that corresponds to that data group , wherein the iterator is configured to provide the associated metadata to the reducing .
US6167405A
CLAIM 1
. A method for facilitating the creation of data warehouse requests for populating data warehouse tables defining a particular warehouse design in a data warehouse system comprised of a number of data source systems and a target system , a repository component for storing information representative (data partition, data group, second data group, s corresponding data partition to form corresponding intermediate data, s corresponding data partition) of the warehouse design and a warehouse management interface component operatively coupled to the repository for enabling development of warehouse requests required to populate the warehouse tables , each warehouse request having a plurality of subcomponents specifying a process of extracting data from source tables of a source database located in one of the data source systems , moving the data to the target system , transforming the data to match target system requirements and then storing the data into a target database of the target system , the method comprising the steps of : (a) during the design phase , generating and storing in the repository , information defining reference links between each target data warehouse table and the source tables from which instances must be extracted , identification of the source databases and target database , reference links between corresponding portions of the source and target tables , and identification of those warehouse request entities related to a number of target tables to be populated by a particular warehouse request ;
(b) upon completion of the design phase , invoking a data replication management (DRM) component included within the warehouse management interface component in response to a selection of a warehouse request to be implemented ;
(c) in response to the selection , automatically creating the different subcomponents of a data warehouse request by the DRM component accessing the previously created reference links from the repository and displaying a visual representation of the subcomponents of the request ;
and , (d) providing access to menus of a menu facility for enabling visualization of the automatically created data warehouse request and for making any required modifications to information related to the each of the subcomponents selected for display prior to scheduling the request for execution .

US6167405A
CLAIM 25
. The apparatus of claim 24 wherein the DRM component menu facility further includes : (c) first control (data partition, data group, second data group, s corresponding data partition to form corresponding intermediate data, s corresponding data partition) means operative in response to being supplied properties pertaining to a particular subcomponent object , to signal when the subcomponent object or event has been configured by the DRM component .

US8190610B2
CLAIM 33
. A map-reduce method of processing data from a plurality of groups having different schema over a computer system , the method comprising : for a first data set having a plurality of first key-value pairs , wherein such first data set belongs to a first data group (information representative, first control) and the first key-value pairs have a first schema , partitioning the first data set into a plurality of data partitions (data replication) and providing each data partition (information representative, first control) to a selected one of a plurality of mapping functions (corresponding port) that are each user-configurable to independently output a plurality of lists of values for each of a set of keys found in such map function' ;

s corresponding data partition to form a first intermediate data set having a first set (first type) of resulting key-value pairs ;

for a second data set having a plurality of second key-value pairs , wherein such second data set belongs to a second data group and the second key-value pairs have a second schema , partitioning the first data set into a plurality of data partitions and providing each data partition to a selected one of a plurality of mapping functions to form a second intermediate data set having a second set of resulting key-value pairs differing from the first set of resulting key-value pairs , wherein the first schema differs from the second schema , wherein the first and second schema and the first and second set of resulting key-value pairs have a key in common ;

and reducing the first and the second intermediate data set together so as to form an output data set , wherein the reducing is accomplished by iterating on at least one key from the first and second intermediate data set , and the output data set has a different schema than the first and second schema , wherein the mapping and reducing operations are performed by a distributed system .
US6167405A
CLAIM 1
. A method for facilitating the creation of data warehouse requests for populating data warehouse tables defining a particular warehouse design in a data warehouse system comprised of a number of data source systems and a target system , a repository component for storing information representative (data partition, data group, second data group, s corresponding data partition to form corresponding intermediate data, s corresponding data partition) of the warehouse design and a warehouse management interface component operatively coupled to the repository for enabling development of warehouse requests required to populate the warehouse tables , each warehouse request having a plurality of subcomponents specifying a process of extracting data from source tables of a source database located in one of the data source systems , moving the data to the target system , transforming the data to match target system requirements and then storing the data into a target database of the target system , the method comprising the steps of : (a) during the design phase , generating and storing in the repository , information defining reference links between each target data warehouse table and the source tables from which instances must be extracted , identification of the source databases and target database , reference links between corresponding port (mapping functions) ions of the source and target tables , and identification of those warehouse request entities related to a number of target tables to be populated by a particular warehouse request ;
(b) upon completion of the design phase , invoking a data replication (data partitions) management (DRM) component included within the warehouse management interface component in response to a selection of a warehouse request to be implemented ;
(c) in response to the selection , automatically creating the different subcomponents of a data warehouse request by the DRM component accessing the previously created reference links from the repository and displaying a visual representation of the subcomponents of the request ;
and , (d) providing access to menus of a menu facility for enabling visualization of the automatically created data warehouse request and for making any required modifications to information related to the each of the subcomponents selected for display prior to scheduling the request for execution .

US6167405A
CLAIM 14
. The method of claim 13 wherein in response to selecting the first type (first set) of Extract icon object , the DRM management request section provides through the menu facility , a first menu for displaying properties pertaining to a database event that include a database programming language statement , a database name , a database type and an Event description .

US6167405A
CLAIM 25
. The apparatus of claim 24 wherein the DRM component menu facility further includes : (c) first control (data partition, data group, second data group, s corresponding data partition to form corresponding intermediate data, s corresponding data partition) means operative in response to being supplied properties pertaining to a particular subcomponent object , to signal when the subcomponent object or event has been configured by the DRM component .

US8190610B2
CLAIM 40
. A computer system including a plurality of computing devices , the computer system configured to process data of a data set , wherein the data set comprises a plurality of data groups , the computer system comprises at least one processor and memory that are operable to perform the following operations : for a first data set having a plurality of first key-value pairs , wherein such first data set belongs to a first data group (information representative, first control) and the first key-value pairs have a first schema , partitioning the first data set into a plurality of data partitions (data replication) and providing each data partition (information representative, first control) to a selected one of a plurality of mapping functions (corresponding port) that are each user-configurable to independently output a plurality of lists of values for each of a set of keys found in such map function' ;

s corresponding data partition to form a first intermediate data set having a first set (first type) of resulting key-value pairs ;

for a second data set having a plurality of second key-value pairs , wherein such second data set belongs to a second data group and the second key-value pairs have a second schema , partitioning the first data set into a plurality of data partitions and providing each data partition to a selected one of a plurality of mapping functions to form a second intermediate data set having a second set of resulting key-value pairs differing from the first set of resulting key-value pairs , wherein the first schema differs from the second schema ;

and reducing the first and the second intermediate data set together so as to form an output data set , wherein the reducing is accomplished by iterating on at least one key from the first and second intermediate data set , and the output data set has a different schema than the first and second schema , wherein the first and second schema and the first and second set of resulting key-value pairs have a key in common .
US6167405A
CLAIM 1
. A method for facilitating the creation of data warehouse requests for populating data warehouse tables defining a particular warehouse design in a data warehouse system comprised of a number of data source systems and a target system , a repository component for storing information representative (data partition, data group, second data group, s corresponding data partition to form corresponding intermediate data, s corresponding data partition) of the warehouse design and a warehouse management interface component operatively coupled to the repository for enabling development of warehouse requests required to populate the warehouse tables , each warehouse request having a plurality of subcomponents specifying a process of extracting data from source tables of a source database located in one of the data source systems , moving the data to the target system , transforming the data to match target system requirements and then storing the data into a target database of the target system , the method comprising the steps of : (a) during the design phase , generating and storing in the repository , information defining reference links between each target data warehouse table and the source tables from which instances must be extracted , identification of the source databases and target database , reference links between corresponding port (mapping functions) ions of the source and target tables , and identification of those warehouse request entities related to a number of target tables to be populated by a particular warehouse request ;
(b) upon completion of the design phase , invoking a data replication (data partitions) management (DRM) component included within the warehouse management interface component in response to a selection of a warehouse request to be implemented ;
(c) in response to the selection , automatically creating the different subcomponents of a data warehouse request by the DRM component accessing the previously created reference links from the repository and displaying a visual representation of the subcomponents of the request ;
and , (d) providing access to menus of a menu facility for enabling visualization of the automatically created data warehouse request and for making any required modifications to information related to the each of the subcomponents selected for display prior to scheduling the request for execution .

US6167405A
CLAIM 14
. The method of claim 13 wherein in response to selecting the first type (first set) of Extract icon object , the DRM management request section provides through the menu facility , a first menu for displaying properties pertaining to a database event that include a database programming language statement , a database name , a database type and an Event description .

US6167405A
CLAIM 25
. The apparatus of claim 24 wherein the DRM component menu facility further includes : (c) first control (data partition, data group, second data group, s corresponding data partition to form corresponding intermediate data, s corresponding data partition) means operative in response to being supplied properties pertaining to a particular subcomponent object , to signal when the subcomponent object or event has been configured by the DRM component .




US8190610B2

Filed: 2006-10-05     Issued: 2012-05-29

MapReduce for distributed database processing

(Original Assignee) Yahoo Inc     (Current Assignee) R2 Solutions LLC ; Altaba Inc

Ali Dasdan, Hung-Chih Yang, Ruey-Lung Hsiao
US6509898B2

Filed: 1998-04-17     Issued: 2003-01-21

Usage based methods of traversing and displaying generalized graph structures

(Original Assignee) Xerox Corp     (Current Assignee) Google LLC

Ed H. Chi, Peter L. T. Pirolli, James E. Pitkow, Rich Gossweller, Jock D. Mackinlay, Stuart K. Card
US8190610B2
CLAIM 1
. A method of processing data of a data set over a distributed system , wherein the data set comprises a plurality of data groups , the method comprising : partitioning the data of each one of the data groups into a plurality of data partitions that each have a plurality of key-value pairs and providing each data partition to a selected one of a plurality of mapping functions (depth h) that are each user-configurable to independently output a plurality of lists of values for each of a set of keys found in such map function' ;

s corresponding data partition to form corresponding intermediate data for that data group and identifiable to that data group , wherein the data of a first data group has a different schema than the data of a second data group and the data of the first data group is mapped differently than the data of the second data group so that different lists of values are output for the corresponding different intermediate data , wherein the different schema and corresponding different intermediate data have a key in common ;

and reducing the intermediate data for the data groups to at least one output data group , including processing the intermediate data for each data group in a manner that is defined to correspond to that data group , so as to result in a merging of the corresponding different intermediate data based on the key in common , wherein the mapping and reducing operations are performed by a distributed system .
US6509898B2
CLAIM 4
. A method as in claim 3 , further comprising the step of : (g) incrementing the current depth if all nodes at the current depth h (mapping functions) ave been selected .

US8190610B2
CLAIM 17
. A computer system including a plurality of computing devices (display device) , the computer system configured to process data of a data set , wherein the data set comprises a plurality of data groups , the computer system comprises at least one processor and memory that are operable to perform the following operations : partitioning the data of each one of the data groups into a plurality of data partitions that each have a plurality of key-value pairs and providing each data partition to a selected one of a plurality of mapping functions (depth h) that are each user-configurable to independently output a plurality of lists of values for each of a set of keys found in such map function' ;

s corresponding data partition to form corresponding intermediate data for that data group and identifiable to that data group , wherein the data of a first data group has a different schema than the data of a second data group and the data of the first data group is mapped differently than the data of the second data group so that different lists of values are output for the corresponding different intermediate data , wherein the different schema and corresponding different intermediate data have a key in common ;

and reduce the intermediate data for the data groups to at least one output data group , including processing the intermediate data for each data group in a manner that is defined to correspond to that data group so as to result in a merging of the corresponding different intermediate data based on the key in common .
US6509898B2
CLAIM 4
. A method as in claim 3 , further comprising the step of : (g) incrementing the current depth if all nodes at the current depth h (mapping functions) ave been selected .

US6509898B2
CLAIM 34
. An apparatus for displaying a tree structure , comprising : a processor ;
a display device (computing devices) coupled to the processor ;
and a processor readable storage medium coupled to the processor containing processor readable program code for programming the apparatus to perform a method for displaying the tree structure , the method comprising the steps of : (a) for each group of sibling nodes in the tree structure , ranking the sibling nodes according to a usage parameter value associated with each node ;
and (b) positioning each group of sibling nodes in accordance with the rankings of the sibling nodes .

US8190610B2
CLAIM 33
. A map-reduce method of processing data from a plurality of groups having different schema over a computer system , the method comprising : for a first data set having a plurality of first key-value pairs , wherein such first data set belongs to a first data group and the first key-value pairs have a first schema , partitioning the first data set into a plurality of data partitions and providing each data partition to a selected one of a plurality of mapping functions (depth h) that are each user-configurable to independently output a plurality of lists of values for each of a set of keys found in such map function' ;

s corresponding data partition to form a first intermediate data set having a first set of resulting key-value pairs ;

for a second data set having a plurality of second key-value pairs , wherein such second data set belongs to a second data group and the second key-value pairs have a second schema , partitioning the first data set into a plurality of data partitions and providing each data partition to a selected one of a plurality of mapping functions to form a second intermediate data set having a second set of resulting key-value pairs differing from the first set of resulting key-value pairs , wherein the first schema differs from the second schema , wherein the first and second schema and the first and second set of resulting key-value pairs have a key in common ;

and reducing the first and the second intermediate data set together so as to form an output data set , wherein the reducing is accomplished by iterating on at least one key from the first and second intermediate data set , and the output data set has a different schema than the first and second schema , wherein the mapping and reducing operations are performed by a distributed system .
US6509898B2
CLAIM 4
. A method as in claim 3 , further comprising the step of : (g) incrementing the current depth if all nodes at the current depth h (mapping functions) ave been selected .

US8190610B2
CLAIM 40
. A computer system including a plurality of computing devices (display device) , the computer system configured to process data of a data set , wherein the data set comprises a plurality of data groups , the computer system comprises at least one processor and memory that are operable to perform the following operations : for a first data set having a plurality of first key-value pairs , wherein such first data set belongs to a first data group and the first key-value pairs have a first schema , partitioning the first data set into a plurality of data partitions and providing each data partition to a selected one of a plurality of mapping functions (depth h) that are each user-configurable to independently output a plurality of lists of values for each of a set of keys found in such map function' ;

s corresponding data partition to form a first intermediate data set having a first set of resulting key-value pairs ;

for a second data set having a plurality of second key-value pairs , wherein such second data set belongs to a second data group and the second key-value pairs have a second schema , partitioning the first data set into a plurality of data partitions and providing each data partition to a selected one of a plurality of mapping functions to form a second intermediate data set having a second set of resulting key-value pairs differing from the first set of resulting key-value pairs , wherein the first schema differs from the second schema ;

and reducing the first and the second intermediate data set together so as to form an output data set , wherein the reducing is accomplished by iterating on at least one key from the first and second intermediate data set , and the output data set has a different schema than the first and second schema , wherein the first and second schema and the first and second set of resulting key-value pairs have a key in common .
US6509898B2
CLAIM 4
. A method as in claim 3 , further comprising the step of : (g) incrementing the current depth if all nodes at the current depth h (mapping functions) ave been selected .

US6509898B2
CLAIM 34
. An apparatus for displaying a tree structure , comprising : a processor ;
a display device (computing devices) coupled to the processor ;
and a processor readable storage medium coupled to the processor containing processor readable program code for programming the apparatus to perform a method for displaying the tree structure , the method comprising the steps of : (a) for each group of sibling nodes in the tree structure , ranking the sibling nodes according to a usage parameter value associated with each node ;
and (b) positioning each group of sibling nodes in accordance with the rankings of the sibling nodes .




US8190610B2

Filed: 2006-10-05     Issued: 2012-05-29

MapReduce for distributed database processing

(Original Assignee) Yahoo Inc     (Current Assignee) R2 Solutions LLC ; Altaba Inc

Ali Dasdan, Hung-Chih Yang, Ruey-Lung Hsiao
US5943663A

Filed: 1997-12-13     Issued: 1999-08-24

Data processing method and system utilizing parallel processing

(Original Assignee) Mouradian; Gary C.     

Gary C. Mouradian
US8190610B2
CLAIM 1
. A method of processing data of a data set over a distributed system , wherein the data set comprises a plurality of data groups (new pressure) , the method comprising : partitioning the data of each one of the data groups into a plurality of data partitions that each have a plurality of key-value pairs and providing each data partition to a selected one of a plurality of mapping functions that are each user-configurable to independently output a plurality of lists of values for each of a set of keys found in such map function' ;

s corresponding data partition to form corresponding intermediate data for that data group and identifiable to that data group , wherein the data of a first data group has a different schema than the data of a second data (said memory) group and the data of the first data group is mapped differently than the data of the second data group so that different lists of values are output for the corresponding different intermediate data , wherein the different schema and corresponding different intermediate data have a key in common ;

and reducing the intermediate data for the data groups to at least one output data group , including processing the intermediate data for each data group in a manner that is defined to correspond to that data group , so as to result in a merging of the corresponding different intermediate data based on the key in common , wherein the mapping and reducing operations are performed by a distributed system .
US5943663A
CLAIM 1
. A computer implemented method in a system for data processing that includes a computer having a memory and at least one processor connected for accessing the memory , the method comprising the steps of : measuring sensory signals and converting the sensory signals to electronic signals representing data elements in the form of binary signals through an input device , said electronic signals being provided at a predetermined data resolution rate ;
conveying the electronic signals from the input device to the computer ;
creating pressure based data forms from data elements received from the input device , each pressure based data form having a time stamp element , a spatial address element , and data form element ;
iteratively evolving relative awareness by converging two or more pressure based data forms by applying gravitational logic to form new pressure (data groups, computer system) based data forms , each of the pressure based data forms representing specific relative awareness states ;
and means for storing said pressure based data forms in said memory (second data) .

US8190610B2
CLAIM 2
. The method of claim 1 , wherein : the at least one output data group is a plurality of output data groups (new pressure) .
US5943663A
CLAIM 1
. A computer implemented method in a system for data processing that includes a computer having a memory and at least one processor connected for accessing the memory , the method comprising the steps of : measuring sensory signals and converting the sensory signals to electronic signals representing data elements in the form of binary signals through an input device , said electronic signals being provided at a predetermined data resolution rate ;
conveying the electronic signals from the input device to the computer ;
creating pressure based data forms from data elements received from the input device , each pressure based data form having a time stamp element , a spatial address element , and data form element ;
iteratively evolving relative awareness by converging two or more pressure based data forms by applying gravitational logic to form new pressure (data groups, computer system) based data forms , each of the pressure based data forms representing specific relative awareness states ;
and means for storing said pressure based data forms in said memory .

US8190610B2
CLAIM 14
. The method of claim 13 , wherein : the reducing step is carried out by a plurality of reducers ;

and the data that is not intermediate data , for a particular reducer (representing data) , includes data that is associated with another reducer .
US5943663A
CLAIM 1
. A computer implemented method in a system for data processing that includes a computer having a memory and at least one processor connected for accessing the memory , the method comprising the steps of : measuring sensory signals and converting the sensory signals to electronic signals representing data (particular reducer) elements in the form of binary signals through an input device , said electronic signals being provided at a predetermined data resolution rate ;
conveying the electronic signals from the input device to the computer ;
creating pressure based data forms from data elements received from the input device , each pressure based data form having a time stamp element , a spatial address element , and data form element ;
iteratively evolving relative awareness by converging two or more pressure based data forms by applying gravitational logic to form new pressure based data forms , each of the pressure based data forms representing specific relative awareness states ;
and means for storing said pressure based data forms in said memory .

US8190610B2
CLAIM 15
. The method of claim 13 , wherein : the reducing step is carried out by a plurality of reducers ;

and the data that is not intermediate data , for a particular reducer (representing data) , includes data that is associated with that reducer .
US5943663A
CLAIM 1
. A computer implemented method in a system for data processing that includes a computer having a memory and at least one processor connected for accessing the memory , the method comprising the steps of : measuring sensory signals and converting the sensory signals to electronic signals representing data (particular reducer) elements in the form of binary signals through an input device , said electronic signals being provided at a predetermined data resolution rate ;
conveying the electronic signals from the input device to the computer ;
creating pressure based data forms from data elements received from the input device , each pressure based data form having a time stamp element , a spatial address element , and data form element ;
iteratively evolving relative awareness by converging two or more pressure based data forms by applying gravitational logic to form new pressure based data forms , each of the pressure based data forms representing specific relative awareness states ;
and means for storing said pressure based data forms in said memory .

US8190610B2
CLAIM 16
. The method of claim 1 wherein the reducing step includes relating the data among the plurality of data groups (new pressure) .
US5943663A
CLAIM 1
. A computer implemented method in a system for data processing that includes a computer having a memory and at least one processor connected for accessing the memory , the method comprising the steps of : measuring sensory signals and converting the sensory signals to electronic signals representing data elements in the form of binary signals through an input device , said electronic signals being provided at a predetermined data resolution rate ;
conveying the electronic signals from the input device to the computer ;
creating pressure based data forms from data elements received from the input device , each pressure based data form having a time stamp element , a spatial address element , and data form element ;
iteratively evolving relative awareness by converging two or more pressure based data forms by applying gravitational logic to form new pressure (data groups, computer system) based data forms , each of the pressure based data forms representing specific relative awareness states ;
and means for storing said pressure based data forms in said memory .

US8190610B2
CLAIM 17
. A computer system (new pressure) including a plurality of computing devices , the computer system configured to process data of a data set , wherein the data set comprises a plurality of data groups (new pressure) , the computer system comprises at least one processor and memory that are operable to perform the following operations : partitioning the data of each one of the data groups into a plurality of data partitions that each have a plurality of key-value pairs and providing each data partition to a selected one of a plurality of mapping functions that are each user-configurable to independently output a plurality of lists of values for each of a set of keys found in such map function' ;

s corresponding data partition to form corresponding intermediate data for that data group and identifiable to that data group , wherein the data of a first data group has a different schema than the data of a second data (said memory) group and the data of the first data group is mapped differently than the data of the second data group so that different lists of values are output for the corresponding different intermediate data , wherein the different schema and corresponding different intermediate data have a key in common ;

and reduce the intermediate data for the data groups to at least one output data group , including processing the intermediate data for each data group in a manner that is defined to correspond to that data group so as to result in a merging of the corresponding different intermediate data based on the key in common .
US5943663A
CLAIM 1
. A computer implemented method in a system for data processing that includes a computer having a memory and at least one processor connected for accessing the memory , the method comprising the steps of : measuring sensory signals and converting the sensory signals to electronic signals representing data elements in the form of binary signals through an input device , said electronic signals being provided at a predetermined data resolution rate ;
conveying the electronic signals from the input device to the computer ;
creating pressure based data forms from data elements received from the input device , each pressure based data form having a time stamp element , a spatial address element , and data form element ;
iteratively evolving relative awareness by converging two or more pressure based data forms by applying gravitational logic to form new pressure (data groups, computer system) based data forms , each of the pressure based data forms representing specific relative awareness states ;
and means for storing said pressure based data forms in said memory (second data) .

US8190610B2
CLAIM 18
. The computer system (new pressure) of claim 17 , wherein : the at least one output data group is a plurality of output data groups (new pressure) .
US5943663A
CLAIM 1
. A computer implemented method in a system for data processing that includes a computer having a memory and at least one processor connected for accessing the memory , the method comprising the steps of : measuring sensory signals and converting the sensory signals to electronic signals representing data elements in the form of binary signals through an input device , said electronic signals being provided at a predetermined data resolution rate ;
conveying the electronic signals from the input device to the computer ;
creating pressure based data forms from data elements received from the input device , each pressure based data form having a time stamp element , a spatial address element , and data form element ;
iteratively evolving relative awareness by converging two or more pressure based data forms by applying gravitational logic to form new pressure (data groups, computer system) based data forms , each of the pressure based data forms representing specific relative awareness states ;
and means for storing said pressure based data forms in said memory .

US8190610B2
CLAIM 19
. The computer system (new pressure) of claim 17 , wherein : corresponding intermediate data for a data group being identifiable to the data group includes the corresponding intermediate data being stored such that metadata associated with the manner of storage is identifiable to the data group .
US5943663A
CLAIM 1
. A computer implemented method in a system for data processing that includes a computer having a memory and at least one processor connected for accessing the memory , the method comprising the steps of : measuring sensory signals and converting the sensory signals to electronic signals representing data elements in the form of binary signals through an input device , said electronic signals being provided at a predetermined data resolution rate ;
conveying the electronic signals from the input device to the computer ;
creating pressure based data forms from data elements received from the input device , each pressure based data form having a time stamp element , a spatial address element , and data form element ;
iteratively evolving relative awareness by converging two or more pressure based data forms by applying gravitational logic to form new pressure (data groups, computer system) based data forms , each of the pressure based data forms representing specific relative awareness states ;
and means for storing said pressure based data forms in said memory .

US8190610B2
CLAIM 20
. The computer system (new pressure) of claim 19 , wherein : corresponding intermediate data for a data group being identifiable to the data group includes the computer system being configured to store the corresponding intermediate data in association with data that is identifiable to the data group .
US5943663A
CLAIM 1
. A computer implemented method in a system for data processing that includes a computer having a memory and at least one processor connected for accessing the memory , the method comprising the steps of : measuring sensory signals and converting the sensory signals to electronic signals representing data elements in the form of binary signals through an input device , said electronic signals being provided at a predetermined data resolution rate ;
conveying the electronic signals from the input device to the computer ;
creating pressure based data forms from data elements received from the input device , each pressure based data form having a time stamp element , a spatial address element , and data form element ;
iteratively evolving relative awareness by converging two or more pressure based data forms by applying gravitational logic to form new pressure (data groups, computer system) based data forms , each of the pressure based data forms representing specific relative awareness states ;
and means for storing said pressure based data forms in said memory .

US8190610B2
CLAIM 21
. The computer system (new pressure) of claim 17 , wherein : the at least one processor and memory being configured to process the intermediate data for each data group in a manner that is defined to correspond to that data group includes , for each data group , the at least one processor and memory being configured to employ an iterator that corresponds to that data group .
US5943663A
CLAIM 1
. A computer implemented method in a system for data processing that includes a computer having a memory and at least one processor connected for accessing the memory , the method comprising the steps of : measuring sensory signals and converting the sensory signals to electronic signals representing data elements in the form of binary signals through an input device , said electronic signals being provided at a predetermined data resolution rate ;
conveying the electronic signals from the input device to the computer ;
creating pressure based data forms from data elements received from the input device , each pressure based data form having a time stamp element , a spatial address element , and data form element ;
iteratively evolving relative awareness by converging two or more pressure based data forms by applying gravitational logic to form new pressure (data groups, computer system) based data forms , each of the pressure based data forms representing specific relative awareness states ;
and means for storing said pressure based data forms in said memory .

US8190610B2
CLAIM 22
. The computer system (new pressure) of claim 21 , wherein : the intermediate data includes a plurality of grouped sets of key/value pairs ;

the at least one processor and memory are further operable to reduce the intermediate data via a plurality of reducers ;

for at least one of the reducers , the iterator corresponding to a particular data group , for that reducer , is configured to operate according to a different key of a different schema than the iterator corresponding to another particular data group , for that reducer .
US5943663A
CLAIM 1
. A computer implemented method in a system for data processing that includes a computer having a memory and at least one processor connected for accessing the memory , the method comprising the steps of : measuring sensory signals and converting the sensory signals to electronic signals representing data elements in the form of binary signals through an input device , said electronic signals being provided at a predetermined data resolution rate ;
conveying the electronic signals from the input device to the computer ;
creating pressure based data forms from data elements received from the input device , each pressure based data form having a time stamp element , a spatial address element , and data form element ;
iteratively evolving relative awareness by converging two or more pressure based data forms by applying gravitational logic to form new pressure (data groups, computer system) based data forms , each of the pressure based data forms representing specific relative awareness states ;
and means for storing said pressure based data forms in said memory .

US8190610B2
CLAIM 23
. The computer system (new pressure) of claim 17 , wherein : the intermediate data includes a plurality of grouped sets of key/value pairs ;

the at least one processor and memory are further operable to reduce the intermediate data via a plurality of reducers ;

and the at least one processor and memory are further operable to : partition the intermediate data into a plurality of partitions , to provide each key/value pair of the intermediate data to a separate one of the partitions ;

and provide the intermediate data of each partition to a separate one of the reducers .
US5943663A
CLAIM 1
. A computer implemented method in a system for data processing that includes a computer having a memory and at least one processor connected for accessing the memory , the method comprising the steps of : measuring sensory signals and converting the sensory signals to electronic signals representing data elements in the form of binary signals through an input device , said electronic signals being provided at a predetermined data resolution rate ;
conveying the electronic signals from the input device to the computer ;
creating pressure based data forms from data elements received from the input device , each pressure based data form having a time stamp element , a spatial address element , and data form element ;
iteratively evolving relative awareness by converging two or more pressure based data forms by applying gravitational logic to form new pressure (data groups, computer system) based data forms , each of the pressure based data forms representing specific relative awareness states ;
and means for storing said pressure based data forms in said memory .

US8190610B2
CLAIM 24
. The computer system (new pressure) of claim 17 , wherein : the intermediate data includes a plurality of grouped sets of key/value pairs ;

the at least one processor and memory are further operable to reduce the intermediate data via a plurality of reducers ;

and the at least one processor and memory are further operable to : partition the intermediate data into a plurality of partitions , to provide at least some of the key/value pairs of the intermediate data to more than one of the partitions ;

and provide the intermediate data of each partition to a separate one of the reducers .
US5943663A
CLAIM 1
. A computer implemented method in a system for data processing that includes a computer having a memory and at least one processor connected for accessing the memory , the method comprising the steps of : measuring sensory signals and converting the sensory signals to electronic signals representing data elements in the form of binary signals through an input device , said electronic signals being provided at a predetermined data resolution rate ;
conveying the electronic signals from the input device to the computer ;
creating pressure based data forms from data elements received from the input device , each pressure based data form having a time stamp element , a spatial address element , and data form element ;
iteratively evolving relative awareness by converging two or more pressure based data forms by applying gravitational logic to form new pressure (data groups, computer system) based data forms , each of the pressure based data forms representing specific relative awareness states ;
and means for storing said pressure based data forms in said memory .

US8190610B2
CLAIM 25
. The computer system (new pressure) of claim 24 , wherein : the at least one processor and memory are further operable to , in partitioning , to provide all of the key/value pairs of the intermediate data to all of the partitions .
US5943663A
CLAIM 1
. A computer implemented method in a system for data processing that includes a computer having a memory and at least one processor connected for accessing the memory , the method comprising the steps of : measuring sensory signals and converting the sensory signals to electronic signals representing data elements in the form of binary signals through an input device , said electronic signals being provided at a predetermined data resolution rate ;
conveying the electronic signals from the input device to the computer ;
creating pressure based data forms from data elements received from the input device , each pressure based data form having a time stamp element , a spatial address element , and data form element ;
iteratively evolving relative awareness by converging two or more pressure based data forms by applying gravitational logic to form new pressure (data groups, computer system) based data forms , each of the pressure based data forms representing specific relative awareness states ;
and means for storing said pressure based data forms in said memory .

US8190610B2
CLAIM 26
. The computer system (new pressure) of claim 24 , wherein : at least some of the reducers include a sort , group-by-key and combine task ;

the at least one processor and memory are further operable to generate and provide metadata for at least some of the mapping , partitioning , combining , grouping and sorting .
US5943663A
CLAIM 1
. A computer implemented method in a system for data processing that includes a computer having a memory and at least one processor connected for accessing the memory , the method comprising the steps of : measuring sensory signals and converting the sensory signals to electronic signals representing data elements in the form of binary signals through an input device , said electronic signals being provided at a predetermined data resolution rate ;
conveying the electronic signals from the input device to the computer ;
creating pressure based data forms from data elements received from the input device , each pressure based data form having a time stamp element , a spatial address element , and data form element ;
iteratively evolving relative awareness by converging two or more pressure based data forms by applying gravitational logic to form new pressure (data groups, computer system) based data forms , each of the pressure based data forms representing specific relative awareness states ;
and means for storing said pressure based data forms in said memory .

US8190610B2
CLAIM 27
. The computer system (new pressure) of claim 26 , wherein : the reducing includes processing the metadata .
US5943663A
CLAIM 1
. A computer implemented method in a system for data processing that includes a computer having a memory and at least one processor connected for accessing the memory , the method comprising the steps of : measuring sensory signals and converting the sensory signals to electronic signals representing data elements in the form of binary signals through an input device , said electronic signals being provided at a predetermined data resolution rate ;
conveying the electronic signals from the input device to the computer ;
creating pressure based data forms from data elements received from the input device , each pressure based data form having a time stamp element , a spatial address element , and data form element ;
iteratively evolving relative awareness by converging two or more pressure based data forms by applying gravitational logic to form new pressure (data groups, computer system) based data forms , each of the pressure based data forms representing specific relative awareness states ;
and means for storing said pressure based data forms in said memory .

US8190610B2
CLAIM 28
. The computer system (new pressure) of claim 27 , wherein : the at least one processor and memory being further operable to process the intermediate data for each data group in a manner that is defined to correspond to that data group includes , for each data group , the at least one processor and memory being further operable to employ an iterator that corresponds to that data group , wherein the iterator is configured to provide the associated metadata to the reducing .
US5943663A
CLAIM 1
. A computer implemented method in a system for data processing that includes a computer having a memory and at least one processor connected for accessing the memory , the method comprising the steps of : measuring sensory signals and converting the sensory signals to electronic signals representing data elements in the form of binary signals through an input device , said electronic signals being provided at a predetermined data resolution rate ;
conveying the electronic signals from the input device to the computer ;
creating pressure based data forms from data elements received from the input device , each pressure based data form having a time stamp element , a spatial address element , and data form element ;
iteratively evolving relative awareness by converging two or more pressure based data forms by applying gravitational logic to form new pressure (data groups, computer system) based data forms , each of the pressure based data forms representing specific relative awareness states ;
and means for storing said pressure based data forms in said memory .

US8190610B2
CLAIM 29
. The computer system (new pressure) of claim 17 , wherein : the intermediate data processing of the reducing further comprises processing data that is not intermediate data .
US5943663A
CLAIM 1
. A computer implemented method in a system for data processing that includes a computer having a memory and at least one processor connected for accessing the memory , the method comprising the steps of : measuring sensory signals and converting the sensory signals to electronic signals representing data elements in the form of binary signals through an input device , said electronic signals being provided at a predetermined data resolution rate ;
conveying the electronic signals from the input device to the computer ;
creating pressure based data forms from data elements received from the input device , each pressure based data form having a time stamp element , a spatial address element , and data form element ;
iteratively evolving relative awareness by converging two or more pressure based data forms by applying gravitational logic to form new pressure (data groups, computer system) based data forms , each of the pressure based data forms representing specific relative awareness states ;
and means for storing said pressure based data forms in said memory .

US8190610B2
CLAIM 30
. The computer system (new pressure) of claim 29 , wherein : the at least one processor and memory are further operable to reduce the intermediate data via a plurality of reducers ;

and the data that is not intermediate data , for a particular reducer (representing data) , includes data that is associated with another reducer .
US5943663A
CLAIM 1
. A computer implemented method in a system for data processing that includes a computer having a memory and at least one processor connected for accessing the memory , the method comprising the steps of : measuring sensory signals and converting the sensory signals to electronic signals representing data (particular reducer) elements in the form of binary signals through an input device , said electronic signals being provided at a predetermined data resolution rate ;
conveying the electronic signals from the input device to the computer ;
creating pressure based data forms from data elements received from the input device , each pressure based data form having a time stamp element , a spatial address element , and data form element ;
iteratively evolving relative awareness by converging two or more pressure based data forms by applying gravitational logic to form new pressure (data groups, computer system) based data forms , each of the pressure based data forms representing specific relative awareness states ;
and means for storing said pressure based data forms in said memory .

US8190610B2
CLAIM 31
. The computer system (new pressure) of claim 29 , wherein : the at least one processor and memory are further operable to reduce the intermediate data via a plurality of reducers ;

and the data that is not intermediate data , for a particular reducer (representing data) , includes data that is associated with that reducer .
US5943663A
CLAIM 1
. A computer implemented method in a system for data processing that includes a computer having a memory and at least one processor connected for accessing the memory , the method comprising the steps of : measuring sensory signals and converting the sensory signals to electronic signals representing data (particular reducer) elements in the form of binary signals through an input device , said electronic signals being provided at a predetermined data resolution rate ;
conveying the electronic signals from the input device to the computer ;
creating pressure based data forms from data elements received from the input device , each pressure based data form having a time stamp element , a spatial address element , and data form element ;
iteratively evolving relative awareness by converging two or more pressure based data forms by applying gravitational logic to form new pressure (data groups, computer system) based data forms , each of the pressure based data forms representing specific relative awareness states ;
and means for storing said pressure based data forms in said memory .

US8190610B2
CLAIM 32
. The computer system (new pressure) of claim 17 , wherein the at least one processor and memory are further operable to reduce by relating the data among the plurality of data groups (new pressure) .
US5943663A
CLAIM 1
. A computer implemented method in a system for data processing that includes a computer having a memory and at least one processor connected for accessing the memory , the method comprising the steps of : measuring sensory signals and converting the sensory signals to electronic signals representing data elements in the form of binary signals through an input device , said electronic signals being provided at a predetermined data resolution rate ;
conveying the electronic signals from the input device to the computer ;
creating pressure based data forms from data elements received from the input device , each pressure based data form having a time stamp element , a spatial address element , and data form element ;
iteratively evolving relative awareness by converging two or more pressure based data forms by applying gravitational logic to form new pressure (data groups, computer system) based data forms , each of the pressure based data forms representing specific relative awareness states ;
and means for storing said pressure based data forms in said memory .

US8190610B2
CLAIM 33
. A map-reduce method of processing data from a plurality of groups having different schema over a computer system (new pressure) , the method comprising : for a first data set having a plurality of first key-value pairs , wherein such first data set belongs to a first data group and the first key-value pairs have a first schema , partitioning the first data set into a plurality of data partitions and providing each data partition to a selected one of a plurality of mapping functions that are each user-configurable to independently output a plurality of lists of values for each of a set of keys found in such map function' ;

s corresponding data partition to form a first intermediate data set having a first set of resulting key-value pairs ;

for a second data (said memory) set having a plurality of second key-value pairs , wherein such second data set belongs to a second data group and the second key-value pairs have a second schema , partitioning the first data set into a plurality of data partitions and providing each data partition to a selected one of a plurality of mapping functions to form a second intermediate data set having a second set of resulting key-value pairs differing from the first set of resulting key-value pairs , wherein the first schema differs from the second schema , wherein the first and second schema and the first and second set of resulting key-value pairs have a key in common ;

and reducing the first and the second intermediate data set together so as to form an output data set , wherein the reducing is accomplished by iterating on at least one key from the first and second intermediate data set , and the output data set has a different schema than the first and second schema , wherein the mapping and reducing operations are performed by a distributed system .
US5943663A
CLAIM 1
. A computer implemented method in a system for data processing that includes a computer having a memory and at least one processor connected for accessing the memory , the method comprising the steps of : measuring sensory signals and converting the sensory signals to electronic signals representing data elements in the form of binary signals through an input device , said electronic signals being provided at a predetermined data resolution rate ;
conveying the electronic signals from the input device to the computer ;
creating pressure based data forms from data elements received from the input device , each pressure based data form having a time stamp element , a spatial address element , and data form element ;
iteratively evolving relative awareness by converging two or more pressure based data forms by applying gravitational logic to form new pressure (data groups, computer system) based data forms , each of the pressure based data forms representing specific relative awareness states ;
and means for storing said pressure based data forms in said memory (second data) .

US8190610B2
CLAIM 40
. A computer system (new pressure) including a plurality of computing devices , the computer system configured to process data of a data set , wherein the data set comprises a plurality of data groups (new pressure) , the computer system comprises at least one processor and memory that are operable to perform the following operations : for a first data set having a plurality of first key-value pairs , wherein such first data set belongs to a first data group and the first key-value pairs have a first schema , partitioning the first data set into a plurality of data partitions and providing each data partition to a selected one of a plurality of mapping functions that are each user-configurable to independently output a plurality of lists of values for each of a set of keys found in such map function' ;

s corresponding data partition to form a first intermediate data set having a first set of resulting key-value pairs ;

for a second data (said memory) set having a plurality of second key-value pairs , wherein such second data set belongs to a second data group and the second key-value pairs have a second schema , partitioning the first data set into a plurality of data partitions and providing each data partition to a selected one of a plurality of mapping functions to form a second intermediate data set having a second set of resulting key-value pairs differing from the first set of resulting key-value pairs , wherein the first schema differs from the second schema ;

and reducing the first and the second intermediate data set together so as to form an output data set , wherein the reducing is accomplished by iterating on at least one key from the first and second intermediate data set , and the output data set has a different schema than the first and second schema , wherein the first and second schema and the first and second set of resulting key-value pairs have a key in common .
US5943663A
CLAIM 1
. A computer implemented method in a system for data processing that includes a computer having a memory and at least one processor connected for accessing the memory , the method comprising the steps of : measuring sensory signals and converting the sensory signals to electronic signals representing data elements in the form of binary signals through an input device , said electronic signals being provided at a predetermined data resolution rate ;
conveying the electronic signals from the input device to the computer ;
creating pressure based data forms from data elements received from the input device , each pressure based data form having a time stamp element , a spatial address element , and data form element ;
iteratively evolving relative awareness by converging two or more pressure based data forms by applying gravitational logic to form new pressure (data groups, computer system) based data forms , each of the pressure based data forms representing specific relative awareness states ;
and means for storing said pressure based data forms in said memory (second data) .

US8190610B2
CLAIM 41
. The computer system (new pressure) of claim 40 , wherein the reducing is accomplished by iterating on at least one key from each of the first and second intermediate data set so that the output data set is a merging of a portion of the first and second intermediate data set .
US5943663A
CLAIM 1
. A computer implemented method in a system for data processing that includes a computer having a memory and at least one processor connected for accessing the memory , the method comprising the steps of : measuring sensory signals and converting the sensory signals to electronic signals representing data elements in the form of binary signals through an input device , said electronic signals being provided at a predetermined data resolution rate ;
conveying the electronic signals from the input device to the computer ;
creating pressure based data forms from data elements received from the input device , each pressure based data form having a time stamp element , a spatial address element , and data form element ;
iteratively evolving relative awareness by converging two or more pressure based data forms by applying gravitational logic to form new pressure (data groups, computer system) based data forms , each of the pressure based data forms representing specific relative awareness states ;
and means for storing said pressure based data forms in said memory .

US8190610B2
CLAIM 42
. The computer system (new pressure) of claim 40 , wherein the reducing is accomplished by a plurality of reducers , the d the at least one processor and memory further operable for : partitioning the first or second intermediate data set into a plurality of partitions , at least some of the key-value pairs of the first or second intermediate data being provided to more than one of the partitions ;

and providing the partitioned intermediate data set of each partition to a separate one of the reducers .
US5943663A
CLAIM 1
. A computer implemented method in a system for data processing that includes a computer having a memory and at least one processor connected for accessing the memory , the method comprising the steps of : measuring sensory signals and converting the sensory signals to electronic signals representing data elements in the form of binary signals through an input device , said electronic signals being provided at a predetermined data resolution rate ;
conveying the electronic signals from the input device to the computer ;
creating pressure based data forms from data elements received from the input device , each pressure based data form having a time stamp element , a spatial address element , and data form element ;
iteratively evolving relative awareness by converging two or more pressure based data forms by applying gravitational logic to form new pressure (data groups, computer system) based data forms , each of the pressure based data forms representing specific relative awareness states ;
and means for storing said pressure based data forms in said memory .

US8190610B2
CLAIM 43
. The computer system (new pressure) of claim 42 , wherein all of the key-value pairs of the first and second intermediate data set are provided to all of the reducers .
US5943663A
CLAIM 1
. A computer implemented method in a system for data processing that includes a computer having a memory and at least one processor connected for accessing the memory , the method comprising the steps of : measuring sensory signals and converting the sensory signals to electronic signals representing data elements in the form of binary signals through an input device , said electronic signals being provided at a predetermined data resolution rate ;
conveying the electronic signals from the input device to the computer ;
creating pressure based data forms from data elements received from the input device , each pressure based data form having a time stamp element , a spatial address element , and data form element ;
iteratively evolving relative awareness by converging two or more pressure based data forms by applying gravitational logic to form new pressure (data groups, computer system) based data forms , each of the pressure based data forms representing specific relative awareness states ;
and means for storing said pressure based data forms in said memory .

US8190610B2
CLAIM 44
. The computer system (new pressure) of claim 42 , wherein at least some of the reducers include a sort , group-by-key and combine task , the d the at least one processor and memory are further operable for : generating and providing metadata for at least some of the mapping , partitioning , combining , grouping and sorting .
US5943663A
CLAIM 1
. A computer implemented method in a system for data processing that includes a computer having a memory and at least one processor connected for accessing the memory , the method comprising the steps of : measuring sensory signals and converting the sensory signals to electronic signals representing data elements in the form of binary signals through an input device , said electronic signals being provided at a predetermined data resolution rate ;
conveying the electronic signals from the input device to the computer ;
creating pressure based data forms from data elements received from the input device , each pressure based data form having a time stamp element , a spatial address element , and data form element ;
iteratively evolving relative awareness by converging two or more pressure based data forms by applying gravitational logic to form new pressure (data groups, computer system) based data forms , each of the pressure based data forms representing specific relative awareness states ;
and means for storing said pressure based data forms in said memory .

US8190610B2
CLAIM 45
. The computer system (new pressure) of claim 44 , wherein the reducing includes processing the metadata .
US5943663A
CLAIM 1
. A computer implemented method in a system for data processing that includes a computer having a memory and at least one processor connected for accessing the memory , the method comprising the steps of : measuring sensory signals and converting the sensory signals to electronic signals representing data elements in the form of binary signals through an input device , said electronic signals being provided at a predetermined data resolution rate ;
conveying the electronic signals from the input device to the computer ;
creating pressure based data forms from data elements received from the input device , each pressure based data form having a time stamp element , a spatial address element , and data form element ;
iteratively evolving relative awareness by converging two or more pressure based data forms by applying gravitational logic to form new pressure (data groups, computer system) based data forms , each of the pressure based data forms representing specific relative awareness states ;
and means for storing said pressure based data forms in said memory .

US8190610B2
CLAIM 46
. The computer system (new pressure) of claim 45 , wherein iterating includes providing the associated metadata to the processing of the reducing step .
US5943663A
CLAIM 1
. A computer implemented method in a system for data processing that includes a computer having a memory and at least one processor connected for accessing the memory , the method comprising the steps of : measuring sensory signals and converting the sensory signals to electronic signals representing data elements in the form of binary signals through an input device , said electronic signals being provided at a predetermined data resolution rate ;
conveying the electronic signals from the input device to the computer ;
creating pressure based data forms from data elements received from the input device , each pressure based data form having a time stamp element , a spatial address element , and data form element ;
iteratively evolving relative awareness by converging two or more pressure based data forms by applying gravitational logic to form new pressure (data groups, computer system) based data forms , each of the pressure based data forms representing specific relative awareness states ;
and means for storing said pressure based data forms in said memory .




US8190610B2

Filed: 2006-10-05     Issued: 2012-05-29

MapReduce for distributed database processing

(Original Assignee) Yahoo Inc     (Current Assignee) R2 Solutions LLC ; Altaba Inc

Ali Dasdan, Hung-Chih Yang, Ruey-Lung Hsiao
US6321374B1

Filed: 1997-11-07     Issued: 2001-11-20

Application-independent generator to generate a database transaction manager in heterogeneous information systems

(Original Assignee) International Business Machines Corp     (Current Assignee) International Business Machines Corp

David Mun-Hien Choy
US8190610B2
CLAIM 1
. A method of processing data of a data set (input file) over a distributed system , wherein the data set comprises a plurality of data groups , the method comprising : partitioning the data of each one of the data groups into a plurality of data partitions (one file) that each have a plurality of key-value pairs and providing each data partition to a selected one of a plurality of mapping functions that are each user-configurable to independently output a plurality of lists of values for each of a set of keys found in such map function' ;

s corresponding data partition to form corresponding intermediate data for that data group and identifiable to that data group , wherein the data of a first data group has a different schema than the data of a second data group and the data of the first data group is mapped differently than the data of the second data group so that different lists of values are output for the corresponding different intermediate data , wherein the different schema and corresponding different intermediate data have a key in common ;

and reducing the intermediate data for the data groups to at least one output data group , including processing the intermediate data for each data group in a manner that is defined to correspond to that data group , so as to result in a merging of the corresponding different intermediate data based on the key in common , wherein the mapping and reducing operations are performed by a distributed system .
US6321374B1
CLAIM 11
. The method of claim 5 wherein the database transaction manager object includes means for performing at least some of the following operations : initialize for execution , terminate execution , set execution options , imbed another file in an input file (first set, data set, first data set, output data set, intermediate data set) , write a “user” log record , store a Cataloged object or its catalog record , replace a cataloged object or its catalog record , update a set of catalog records , retrieve a set of cataloged objects or their catalog records , delete a set of cataloged objects or their catalog records , nonrecoverably delete a set of Cataloged objects or their catalog records , commit changes , and roll back changes .

US6321374B1
CLAIM 15
. The method of claim 14 including means for taking input from at least one file (data partitions) .

US8190610B2
CLAIM 17
. A computer system including a plurality of computing devices , the computer system configured to process data of a data set (input file) , wherein the data set comprises a plurality of data groups , the computer system comprises at least one processor and memory that are operable to perform the following operations : partitioning the data of each one of the data groups into a plurality of data partitions (one file) that each have a plurality of key-value pairs and providing each data partition to a selected one of a plurality of mapping functions that are each user-configurable to independently output a plurality of lists of values for each of a set of keys found in such map function' ;

s corresponding data partition to form corresponding intermediate data for that data group and identifiable to that data group , wherein the data of a first data group has a different schema than the data of a second data group and the data of the first data group is mapped differently than the data of the second data group so that different lists of values are output for the corresponding different intermediate data , wherein the different schema and corresponding different intermediate data have a key in common ;

and reduce the intermediate data for the data groups to at least one output data group , including processing the intermediate data for each data group in a manner that is defined to correspond to that data group so as to result in a merging of the corresponding different intermediate data based on the key in common .
US6321374B1
CLAIM 11
. The method of claim 5 wherein the database transaction manager object includes means for performing at least some of the following operations : initialize for execution , terminate execution , set execution options , imbed another file in an input file (first set, data set, first data set, output data set, intermediate data set) , write a “user” log record , store a Cataloged object or its catalog record , replace a cataloged object or its catalog record , update a set of catalog records , retrieve a set of cataloged objects or their catalog records , delete a set of cataloged objects or their catalog records , nonrecoverably delete a set of Cataloged objects or their catalog records , commit changes , and roll back changes .

US6321374B1
CLAIM 15
. The method of claim 14 including means for taking input from at least one file (data partitions) .

US8190610B2
CLAIM 33
. A map-reduce method of processing data from a plurality of groups having different schema (associated parameters) over a computer system , the method comprising : for a first data set (input file) having a plurality of first key-value pairs , wherein such first data set belongs to a first data group and the first key-value pairs have a first schema , partitioning the first data set into a plurality of data partitions (one file) and providing each data partition to a selected one of a plurality of mapping functions that are each user-configurable to independently output a plurality of lists of values for each of a set of keys found in such map function' ;

s corresponding data partition to form a first intermediate data set having a first set (input file) of resulting key-value pairs ;

for a second data set having a plurality of second key-value pairs , wherein such second data set belongs to a second data group and the second key-value pairs have a second schema , partitioning the first data set into a plurality of data partitions and providing each data partition to a selected one of a plurality of mapping functions to form a second intermediate data set having a second set of resulting key-value pairs differing from the first set of resulting key-value pairs , wherein the first schema differs from the second schema , wherein the first and second schema and the first and second set of resulting key-value pairs have a key in common ;

and reducing the first and the second intermediate data set together so as to form an output data set , wherein the reducing is accomplished by iterating on at least one key from the first and second intermediate data set , and the output data set has a different schema than the first and second schema , wherein the mapping and reducing operations are performed by a distributed system .
US6321374B1
CLAIM 11
. The method of claim 5 wherein the database transaction manager object includes means for performing at least some of the following operations : initialize for execution , terminate execution , set execution options , imbed another file in an input file (first set, data set, first data set, output data set, intermediate data set) , write a “user” log record , store a Cataloged object or its catalog record , replace a cataloged object or its catalog record , update a set of catalog records , retrieve a set of cataloged objects or their catalog records , delete a set of cataloged objects or their catalog records , nonrecoverably delete a set of Cataloged objects or their catalog records , commit changes , and roll back changes .

US6321374B1
CLAIM 14
. The method of claim 14 wherein the database transaction manager object is multifunctional thereby being operable to take execution options , sequence of operations , and their associated parameters (groups having different schema) and data from at least one input .

US6321374B1
CLAIM 15
. The method of claim 14 including means for taking input from at least one file (data partitions) .

US8190610B2
CLAIM 34
. The map-reduce method of claim 33 , wherein the reducing is accomplished by iterating on at least one key from each of the first and second intermediate data set (input file) so that the output data set is a merging of a portion of the first and second intermediate data set .
US6321374B1
CLAIM 11
. The method of claim 5 wherein the database transaction manager object includes means for performing at least some of the following operations : initialize for execution , terminate execution , set execution options , imbed another file in an input file (first set, data set, first data set, output data set, intermediate data set) , write a “user” log record , store a Cataloged object or its catalog record , replace a cataloged object or its catalog record , update a set of catalog records , retrieve a set of cataloged objects or their catalog records , delete a set of cataloged objects or their catalog records , nonrecoverably delete a set of Cataloged objects or their catalog records , commit changes , and roll back changes .

US8190610B2
CLAIM 35
. The map-reduce method of claim 33 , wherein the reducing is accomplished by a plurality of reducers , the method further comprises : partitioning the first or second intermediate data set (input file) into a plurality of partitions , at least some of the key-value pairs of the first or second intermediate data being provided to more than one of the partitions ;

and providing the partitioned intermediate data set of each partition to a separate one of the reducers .
US6321374B1
CLAIM 11
. The method of claim 5 wherein the database transaction manager object includes means for performing at least some of the following operations : initialize for execution , terminate execution , set execution options , imbed another file in an input file (first set, data set, first data set, output data set, intermediate data set) , write a “user” log record , store a Cataloged object or its catalog record , replace a cataloged object or its catalog record , update a set of catalog records , retrieve a set of cataloged objects or their catalog records , delete a set of cataloged objects or their catalog records , nonrecoverably delete a set of Cataloged objects or their catalog records , commit changes , and roll back changes .

US8190610B2
CLAIM 36
. The map-reduce method of claim 35 , wherein all of the key-value pairs of the first and second intermediate data set (input file) are provided to all of the reducers .
US6321374B1
CLAIM 11
. The method of claim 5 wherein the database transaction manager object includes means for performing at least some of the following operations : initialize for execution , terminate execution , set execution options , imbed another file in an input file (first set, data set, first data set, output data set, intermediate data set) , write a “user” log record , store a Cataloged object or its catalog record , replace a cataloged object or its catalog record , update a set of catalog records , retrieve a set of cataloged objects or their catalog records , delete a set of cataloged objects or their catalog records , nonrecoverably delete a set of Cataloged objects or their catalog records , commit changes , and roll back changes .

US8190610B2
CLAIM 40
. A computer system including a plurality of computing devices , the computer system configured to process data of a data set (input file) , wherein the data set comprises a plurality of data groups , the computer system comprises at least one processor and memory that are operable to perform the following operations : for a first data set having a plurality of first key-value pairs , wherein such first data set belongs to a first data group and the first key-value pairs have a first schema , partitioning the first data set into a plurality of data partitions (one file) and providing each data partition to a selected one of a plurality of mapping functions that are each user-configurable to independently output a plurality of lists of values for each of a set of keys found in such map function' ;

s corresponding data partition to form a first intermediate data set having a first set (input file) of resulting key-value pairs ;

for a second data set having a plurality of second key-value pairs , wherein such second data set belongs to a second data group and the second key-value pairs have a second schema , partitioning the first data set into a plurality of data partitions and providing each data partition to a selected one of a plurality of mapping functions to form a second intermediate data set having a second set of resulting key-value pairs differing from the first set of resulting key-value pairs , wherein the first schema differs from the second schema ;

and reducing the first and the second intermediate data set together so as to form an output data set , wherein the reducing is accomplished by iterating on at least one key from the first and second intermediate data set , and the output data set has a different schema than the first and second schema , wherein the first and second schema and the first and second set of resulting key-value pairs have a key in common .
US6321374B1
CLAIM 11
. The method of claim 5 wherein the database transaction manager object includes means for performing at least some of the following operations : initialize for execution , terminate execution , set execution options , imbed another file in an input file (first set, data set, first data set, output data set, intermediate data set) , write a “user” log record , store a Cataloged object or its catalog record , replace a cataloged object or its catalog record , update a set of catalog records , retrieve a set of cataloged objects or their catalog records , delete a set of cataloged objects or their catalog records , nonrecoverably delete a set of Cataloged objects or their catalog records , commit changes , and roll back changes .

US6321374B1
CLAIM 15
. The method of claim 14 including means for taking input from at least one file (data partitions) .

US8190610B2
CLAIM 41
. The computer system of claim 40 , wherein the reducing is accomplished by iterating on at least one key from each of the first and second intermediate data set (input file) so that the output data set is a merging of a portion of the first and second intermediate data set .
US6321374B1
CLAIM 11
. The method of claim 5 wherein the database transaction manager object includes means for performing at least some of the following operations : initialize for execution , terminate execution , set execution options , imbed another file in an input file (first set, data set, first data set, output data set, intermediate data set) , write a “user” log record , store a Cataloged object or its catalog record , replace a cataloged object or its catalog record , update a set of catalog records , retrieve a set of cataloged objects or their catalog records , delete a set of cataloged objects or their catalog records , nonrecoverably delete a set of Cataloged objects or their catalog records , commit changes , and roll back changes .

US8190610B2
CLAIM 42
. The computer system of claim 40 , wherein the reducing is accomplished by a plurality of reducers , the d the at least one processor and memory further operable for : partitioning the first or second intermediate data set (input file) into a plurality of partitions , at least some of the key-value pairs of the first or second intermediate data being provided to more than one of the partitions ;

and providing the partitioned intermediate data set of each partition to a separate one of the reducers .
US6321374B1
CLAIM 11
. The method of claim 5 wherein the database transaction manager object includes means for performing at least some of the following operations : initialize for execution , terminate execution , set execution options , imbed another file in an input file (first set, data set, first data set, output data set, intermediate data set) , write a “user” log record , store a Cataloged object or its catalog record , replace a cataloged object or its catalog record , update a set of catalog records , retrieve a set of cataloged objects or their catalog records , delete a set of cataloged objects or their catalog records , nonrecoverably delete a set of Cataloged objects or their catalog records , commit changes , and roll back changes .

US8190610B2
CLAIM 43
. The computer system of claim 42 , wherein all of the key-value pairs of the first and second intermediate data set (input file) are provided to all of the reducers .
US6321374B1
CLAIM 11
. The method of claim 5 wherein the database transaction manager object includes means for performing at least some of the following operations : initialize for execution , terminate execution , set execution options , imbed another file in an input file (first set, data set, first data set, output data set, intermediate data set) , write a “user” log record , store a Cataloged object or its catalog record , replace a cataloged object or its catalog record , update a set of catalog records , retrieve a set of cataloged objects or their catalog records , delete a set of cataloged objects or their catalog records , nonrecoverably delete a set of Cataloged objects or their catalog records , commit changes , and roll back changes .