Suchen und Finden
Mehr zum Inhalt
Enabling Real-Time Business Intelligence - Third International Workshop, BIRTE 2009, Held at the 35th International Conference on Very Large Databases, VLDB 2009, Lyon, France, August 24, 2009, Revised Selected Papers
Preface
5
Organization
6
Table of Contents
7
Queries over Unstructured Data: Probabilistic Methods to the Rescue (Keynote)
8
Unstructured Data in Enterprises
8
Probabilistic Models for Information Extraction
10
Representing Noisy Extractions as Imprecise Databases
11
Multi-attribute Extractions
13
Imprecise Data Models for Representing Uncertainty of De-duplication
15
Probability of Two Records Being Duplicates
15
Probability over Entity Groupings
15
Queries over Imprecise Duplicates
16
Concluding Remarks
18
References
19
Federated Stream Processing Support for Real-Time Business Intelligence Applications
21
Introduction
21
Related Work
22
The MaxStream Federated Stream Processing System
24
Architecture
26
Two Key Building Blocks
28
Hybrid Queries: Using Persistence with Streams
30
Using MaxStream in Real-Time BI Scenarios
32
Reducing Latency in Event-Driven Business Intelligence
32
Persistent Events in Supply-Chain Monitoring
33
Other Real-Time BI Applications
34
Feasibility Study
34
Conclusions and Future Directions
36
References
37
VPipe: Virtual Pipelining for Scheduling of DAG Stream Query Plans
39
Introduction
39
Preliminaries
42
Review of the Chain Scheduling
42
Problem Definition
43
The VPipe Execution Scheme
44
Change of Operator Logic
45
Discussion
47
Stochastic Analysis of Chain
47
System Model Basis
48
Case 1: System Analysis for SOS Synchronization
48
Case 2: System Analysis for IDS Synchronization
50
Performance Study
53
Experiment 1: Response Time Comparison
53
Experiment 2: Broken Pipeline Probability
54
Related Work
54
Conclusion
55
References
55
Ad-Hoc Queries over Document Collections – A Case Study
57
Introduction
57
Query Planning and Query Plan Execution
59
Understanding “Human-Powered” Query Execution Strategies
59
Elementary Plan Operators
60
The Coverage-Join (CJ) and Density-Join (DJ) Operator
64
Example Query and Example Plans
64
Plan Enumeration
65
Case Study
66
Heuristics for Plan Selection
66
Results and Discussion
67
Related Work
69
Summary and Future Work
70
References
71
Appendix: Implementing the KEYWORD-Operator
72
ASSET Queries: A Set-Oriented and Column-Wise Approach to Modern OLAP
73
Introduction
73
Grouping Analysis: A Retrospective
74
Group by
75
Cubes
75
Grouping Variables and the MD-Join
76
Windows
76
MapReduce
77
Associated Sets (ASSET) Queries
77
Definitions
77
SQL Syntax
78
DataMingler: A Spreadsheet-Like GUI
79
ASSET Queries and Data Streams (COSTES)
80
Financial Application Motivating Examples
81
COSTES: Continuous Spreadsheet-Like Computations
83
ASSET Queries and Persistent Data Sources (ASSET QE)
84
Social Networks: A Motivating Example
84
ASSET Query Engine (QE)
86
Conclusions and Future Work
88
References
89
Evaluation of Load Scheduling Strategies for Real-Time Data Warehouse Environments
91
Introduction
91
System Model and Problem Statement
93
System Architecture
93
Workload Model
94
Scheduling Performance Objective
95
Problem Statement
96
Scheduling Policies
97
Scheduling Algorithms for Push-Based Update Propagation
97
Evaluation and Discussion
98
Simulation Framework
98
Effect of the Data Production Process Length
99
Comparison of Local and Global Scheduling
100
Effects of Stage-Concurrent and Long-Running Updates
101
Ratio of Stage-Concurrent Updates
102
Pruning of Irretrievable Queries
103
Effects of Long-Running Update and Queries during Runtime
103
Related Work
104
Conclusion
105
References
106
Near Real-Time Data Warehousing Using State-of-the-Art ETL Tools
107
Near Real-Time Data Warehousing
107
Related Work
108
Data Warehouse Refreshment Anomalies
110
Properties of Operational Data Sources
114
Preventing Refreshment Anomalies
115
Preventing a Change Data Mismatch
116
Making Change Propagation Anomaly-Proof
119
Conclusion
123
References
123
Addressing BI Transactional Flows in the Real-TimeEnterprise Using GoldenGate TDM (Industrial Paper)
125
Background
125
Operational Data, and the Real-Time Enterprise
126
Transactional Data
126
Real-Time
127
Heterogeneous Systems and Interoperability
129
Transactional Consistency
130
Emerging Trends and Problems
130
Amount of Data
130
Adoption of Newer Datatypes
131
Growing Number of Users
131
Changing Nature of Applications
131
Micro-batching
132
Real-Time Data Acquisition
132
ETL and Real-Time Challenges
132
ESB
133
Change Data Capture (CDC)
133
GoldenGate TDM Platform
136
GoldenGate Architecture, Key Components
137
Key Architectural Features and Benefits
141
Use Cases
143
Example Customer Case Studies with Business Challanges,Real-Time Solutions
144
Challenges
146
Conclusion
147
References
148
Near Real–Time Call Detail Record ETL Flows(Industrial Paper)
149
Introduction
149
MVNO Background
150
Problem Statement
151
Our Solution
153
Transformation Rules
154
ETL Flows
155
MVNO CDR Flows
157
Arroyo
159
Related Work
160
Conclusions
160
References
161
Comparing Global Optimization and Default Settings of Stream-Based Joins (Experimental Paper)
162
Introduction
162
Meshjoin
164
Basic Operation
164
Architecture
165
Algorithm
165
Problem Definition
166
Tuning and Performance Comparisons
168
Proposed Investigation
168
Experimental Setup
169
Tuning of Disk-Buffer for Different Memory Budgets
170
Performance Analysis Using Default and Optimum Values forthe Disk-Buffer Size
170
Cost Validation
172
Approach for Choosing the Default Value
173
Related Work
174
Conclusions and Future Work
175
References
175
Merging OLTP and OLAP – Back to the Future (Panel)
178
Introduction
178
Panelists
179
Summary
180
Author Index
181
Alle Preise verstehen sich inklusive der gesetzlichen MwSt.