Skip to main content

Posts

Showing posts from 2011

Data Warehousing Tools

The key general categories of data warehousing tools are: Spreadsheets Reporting and querying software: tools that extract, sort, summarize, and present selected data OLAP: Online analytical processing Digital Dashboards Data mining Decision engineering Process mining Business performance management Local information systems Except for spreadsheets, these tools are sold as standalone tools, suites of tools, components of ERP systems, or as components of software targeted to a specific industry. The tools are sometimes packaged into data warehouse appliances. Open source free products Eclipse BIRT Project JasperSoft Pentaho Community Edition RapidMiner SpagoBI R Open source commercial products Palo (OLAP database) : OLAP Server, Worksheet Server and ETL Server Pentaho : Reporting, analysis, dashboard, data mining and workflow capabilities Proprietary free products InetSoft MicroStrategy MicroStrategy Reporting Suite Proprietary products ActiveR

The Data Warehousing Process

Stage 1 : Determine Informational Requirements • Identify and analyze existing informational capabilities. • Identify from key users the significant business questions and key metrics that the target user.Group regards as their most important requirements for information. • Decompose these metrics into their component parts with specific definitions. • Map the component parts to the informational model and systems of record. Stage 2 : Evolutionary and Iterative Development Process When you begin to develop your first data warehouse increment, the architecture is new and fresh. With the second and subsequent increments, the following is true: • Start with one subject area (or subset or superset) and one target user group. • Continue and add subject areas, user groups and informational capabilities to the architecture based on the organization’s requirements for information, not technology. • Improvements are made from what was learned from previous increments. • Impro

HOW IS THE WAREHOUSE DIFFERENT?

 The data warehouse is distinctly different from the operational data used and maintained by day-to-day operational systems. Data warehousing is not simply an “access wrapper” for operational data, where data is simply “dumped” into tables for direct access. Among the differences:  Comparison of operational systems and data warehousing systems operational systems data warehousing systems Operational systems are generally designed to support high-volume transaction processing with minimal back-end reporting. Data warehousing systems are generally designed to support high-volume analytical processing (i.e. OLAP ) and subsequent, often elaborate report generation . Operational systems are generally process-oriented or process-driven , meaning that they are focused on specific business processes or tasks. Example tasks include billing, registration, etc. Data warehousing systems are generally subject-oriented , organized around business areas that

Data Warehouse-Concepts

A fundamental concept of a data warehouse is the distinction between data and information. Data is composed of observable and recordable facts that are often found in operational or transactional systems.At Rutgers, these systems include the registrar’s data on students (widely known as the SRDB), human resource and payroll databases, course scheduling data, and data on financial aid. In a data warehouse environment, data only comes to have value to end-users when it is organized and presented as information. Information is an integrated collection of facts and is used as the basis for decisionmaking. For example, an academic unit needs to have diachronic information about its extent of instructional output of its different faculty members to gauge if it is becoming more or less reliant on part-time faculty. The data warehouse is that portion of an overall Architected Data Environment that serves as the single integrated source of data for processing information. The data w

Sleep sort in JavaScript and closures

     There is this "genius" sorting algorithm called " sleep sort ". I don't know what came to my mind, but I decided "let's do it in javascript!". I thought that it won't take more than 2 minutes. Starting with such wrong and over confident assumption, it was a journey into the details of javascript which I had never bothered to pay much attention to. I learnt quite a few things on the way and will share the same in this blog.      Here is the code which I wrote off : 1 2 3 4 5 array = [1, 9, 4, 8, 2, 3, 6]; for (i = 0; i < array.length; i++) { window.setTimeout ( "console.log(array[i])" , array[i]*1000); }         What the code was supposed to do was, for each integer in the array, wait for those many seconds and then print that integer. So, the integer 9 will be logged into the console after 9 seconds while the code will log 2 into the console after 2 seconds.       Here is what it logged into the conso