Enabling Real-Time Business Intelligence - Third International Workshop, BIRTE 2009, Held at the 35th International Conference on Very Large Databases, VLDB 2009, Lyon, France, August 24, 2009, Revised Selected Papers

von: Malu Castellanos, Umeshwar Dayal, Renée J. Miller

Springer-Verlag, 2010

ISBN: 9783642145599 , 181 Seiten

Format: PDF, OL

Kopierschutz: Wasserzeichen

Windows PC,Mac OSX geeignet für alle DRM-fähigen eReader Apple iPad, Android Tablet PC's Online-Lesen für: Windows PC,Mac OSX,Linux

Preis: 49,22 EUR

Mehr zum Inhalt

Enabling Real-Time Business Intelligence - Third International Workshop, BIRTE 2009, Held at the 35th International Conference on Very Large Databases, VLDB 2009, Lyon, France, August 24, 2009, Revised Selected Papers


 

Preface

5

Organization

6

Table of Contents

7

Queries over Unstructured Data: Probabilistic Methods to the Rescue (Keynote)

8

Unstructured Data in Enterprises

8

Probabilistic Models for Information Extraction

10

Representing Noisy Extractions as Imprecise Databases

11

Multi-attribute Extractions

13

Imprecise Data Models for Representing Uncertainty of De-duplication

15

Probability of Two Records Being Duplicates

15

Probability over Entity Groupings

15

Queries over Imprecise Duplicates

16

Concluding Remarks

18

References

19

Federated Stream Processing Support for Real-Time Business Intelligence Applications

21

Introduction

21

Related Work

22

The MaxStream Federated Stream Processing System

24

Architecture

26

Two Key Building Blocks

28

Hybrid Queries: Using Persistence with Streams

30

Using MaxStream in Real-Time BI Scenarios

32

Reducing Latency in Event-Driven Business Intelligence

32

Persistent Events in Supply-Chain Monitoring

33

Other Real-Time BI Applications

34

Feasibility Study

34

Conclusions and Future Directions

36

References

37

VPipe: Virtual Pipelining for Scheduling of DAG Stream Query Plans

39

Introduction

39

Preliminaries

42

Review of the Chain Scheduling

42

Problem Definition

43

The VPipe Execution Scheme

44

Change of Operator Logic

45

Discussion

47

Stochastic Analysis of Chain

47

System Model Basis

48

Case 1: System Analysis for SOS Synchronization

48

Case 2: System Analysis for IDS Synchronization

50

Performance Study

53

Experiment 1: Response Time Comparison

53

Experiment 2: Broken Pipeline Probability

54

Related Work

54

Conclusion

55

References

55

Ad-Hoc Queries over Document Collections – A Case Study

57

Introduction

57

Query Planning and Query Plan Execution

59

Understanding “Human-Powered” Query Execution Strategies

59

Elementary Plan Operators

60

The Coverage-Join (CJ) and Density-Join (DJ) Operator

64

Example Query and Example Plans

64

Plan Enumeration

65

Case Study

66

Heuristics for Plan Selection

66

Results and Discussion

67

Related Work

69

Summary and Future Work

70

References

71

Appendix: Implementing the KEYWORD-Operator

72

ASSET Queries: A Set-Oriented and Column-Wise Approach to Modern OLAP

73

Introduction

73

Grouping Analysis: A Retrospective

74

Group by

75

Cubes

75

Grouping Variables and the MD-Join

76

Windows

76

MapReduce

77

Associated Sets (ASSET) Queries

77

Definitions

77

SQL Syntax

78

DataMingler: A Spreadsheet-Like GUI

79

ASSET Queries and Data Streams (COSTES)

80

Financial Application Motivating Examples

81

COSTES: Continuous Spreadsheet-Like Computations

83

ASSET Queries and Persistent Data Sources (ASSET QE)

84

Social Networks: A Motivating Example

84

ASSET Query Engine (QE)

86

Conclusions and Future Work

88

References

89

Evaluation of Load Scheduling Strategies for Real-Time Data Warehouse Environments

91

Introduction

91

System Model and Problem Statement

93

System Architecture

93

Workload Model

94

Scheduling Performance Objective

95

Problem Statement

96

Scheduling Policies

97

Scheduling Algorithms for Push-Based Update Propagation

97

Evaluation and Discussion

98

Simulation Framework

98

Effect of the Data Production Process Length

99

Comparison of Local and Global Scheduling

100

Effects of Stage-Concurrent and Long-Running Updates

101

Ratio of Stage-Concurrent Updates

102

Pruning of Irretrievable Queries

103

Effects of Long-Running Update and Queries during Runtime

103

Related Work

104

Conclusion

105

References

106

Near Real-Time Data Warehousing Using State-of-the-Art ETL Tools

107

Near Real-Time Data Warehousing

107

Related Work

108

Data Warehouse Refreshment Anomalies

110

Properties of Operational Data Sources

114

Preventing Refreshment Anomalies

115

Preventing a Change Data Mismatch

116

Making Change Propagation Anomaly-Proof

119

Conclusion

123

References

123

Addressing BI Transactional Flows in the Real-TimeEnterprise Using GoldenGate TDM (Industrial Paper)

125

Background

125

Operational Data, and the Real-Time Enterprise

126

Transactional Data

126

Real-Time

127

Heterogeneous Systems and Interoperability

129

Transactional Consistency

130

Emerging Trends and Problems

130

Amount of Data

130

Adoption of Newer Datatypes

131

Growing Number of Users

131

Changing Nature of Applications

131

Micro-batching

132

Real-Time Data Acquisition

132

ETL and Real-Time Challenges

132

ESB

133

Change Data Capture (CDC)

133

GoldenGate TDM Platform

136

GoldenGate Architecture, Key Components

137

Key Architectural Features and Benefits

141

Use Cases

143

Example Customer Case Studies with Business Challanges,Real-Time Solutions

144

Challenges

146

Conclusion

147

References

148

Near Real–Time Call Detail Record ETL Flows(Industrial Paper)

149

Introduction

149

MVNO Background

150

Problem Statement

151

Our Solution

153

Transformation Rules

154

ETL Flows

155

MVNO CDR Flows

157

Arroyo

159

Related Work

160

Conclusions

160

References

161

Comparing Global Optimization and Default Settings of Stream-Based Joins (Experimental Paper)

162

Introduction

162

Meshjoin

164

Basic Operation

164

Architecture

165

Algorithm

165

Problem Definition

166

Tuning and Performance Comparisons

168

Proposed Investigation

168

Experimental Setup

169

Tuning of Disk-Buffer for Different Memory Budgets

170

Performance Analysis Using Default and Optimum Values forthe Disk-Buffer Size

170

Cost Validation

172

Approach for Choosing the Default Value

173

Related Work

174

Conclusions and Future Work

175

References

175

Merging OLTP and OLAP – Back to the Future (Panel)

178

Introduction

178

Panelists

179

Summary

180

Author Index

181