Data aggregation is the process of collecting and organization data from disparate sources into a database. An early example of this: the Yellow Pages – an extensive list of local businesses, organized by category and distributed in a big yellow book. Useful if you (the customer) were looking for a local plumber, or wanted to order a pizza in the early hours of the morning.
Today, the Data-as-a-Service (DaaS) market offers businesses easy access to swathes of rich, organized data, available in increasingly innovative forms. A business will typically buy access to a large dataset and integrate it with their own product or service. A subscription fee will provide access to a cloud-based, white-labelled database of anything from standardized product details to geographical information.
Hadoop-based data discovery enables business users to explore and find insights across diverse data (such as clickstreams, social, sensor and transaction data) that is stored and managed in the Hadoop Distributed File System (HDFS). In combination with the interactivity of Spark, it can bypass the extensive modeling required by SQL-based approaches, the specialized skills to generate custom MapReduce, Hive or Pig queries, or the performance penalty of querying Hadoop through Hive.
Model factories bring automation and scalability to the process of building and deploying predictive models. These solutions enable data scientists to build a larger number of complex predictive models and use computationally intensive resources to iteratively search for the best model from a set of candidates.
Natural Language Generation (NLG) combines natural-language processing (NLP) with machine learning and artificial intelligence to dynamically identify the most relevant insights and context in data (trends, relationships, correlation patterns) and then automatically generate a personalized narrative for each user in their context, to explain meaning or highlight key findings in data.
An analytics marketplace is a private or public marketplace for creating, selling, buying, executing, and democratizing analytics. Analytics marketplace can also include underlying data ingestion, data preparation, visualization, and advanced analytic components. The orchestrator of the analytics marketplace will provide for authentication, billing, metering, monitoring, and other services such as search, collaboration, debugging, testing, and technical support. A few visionary companies (such as Microsoft, RapidMiner and FICO) have launched them in 2014-2015.
Uplift modeling is a predictive analytics technique that directly predicts the incremental impact of an action (e.g., medical treatment, marketing or loyalty action) on an individual’s behavior. It is increasingly considered superior to naive response modeling, as it better accounts for negative, positive and no responses.
Data scrubbing, also called data cleansing, is the process of amending or removing data in a database that is incorrect, incomplete, improperly formatted, or duplicated. An organization in a data-intensive field like banking, insurance, retailing, telecommunications, or transportation might use a data scrubbing tool to systematically examine data for flaws by using rules, algorithms, and look-up tables. Typically, a database scrubbing tool includes programs that are capable of correcting a number of specific type of mistakes, such as adding missing zip codes or finding duplicate records. Using a data scrubbing tool can save a database administrator a significant amount of time and can be less costly than fixing errors manually.
Reference lines, bands or distributions may be added to views to emphasize particular values or areas that may be useful in interpreting your data in Tableau. A reference line is typically used to mark a specific value on an axis. The value can be a constant, or a computed value based on a specific field. A reference line can be added on any continuous axis. Click on the above screenshot to see the full image. We see that Tableau has generated a Reference line which is showing the average sum of sales across individual product categories.
Reference distributions are a modification of Reference Bands. They typically shade areas above, below and between two requested statistics. The difference is that they represent a distribution of values along an axis and include confidence intervals, percentages, percentiles, etc…