Data Science

From Verify.Wiki
Jump to: navigation, search
Data Science Definition

Data science involves using automated methods to analyze massive amounts of data (also referred as big data) and to extract knowledge from them. One way to consider data science is as an evolutionary step in interdisciplinary fields like business analysis that incorporate computer science, modeling, statistics, analytics, and mathematics. Data science is also defined as a field that sits at the intersection of social science and statistics, information and computer science, and design. Data science has emerged to solve the problem of explosion in data volumes that traditional statistics cannot solve. Statistics was developed to understand small samples that mostly arose from agriculture. The focus of data science is on extracting,storing, assuring data quality, understanding and communicating information for better decision making. [1]

As data from social media, sensors,web logs, business applications, and the general web is growing rapidly, data science has become the core discipline to extract “actionable insights” from these datasets to help make informed business decisions. A Harvard Business Review article named Data Scientist as "The Sexiest Job of the 21st Century" [2] and the Mashable website reports Data Science provides the best job options for candidates looking for work-life balance.[3] A 2011 study by McKinsey showed in the U.S there will be a shortage of 140,000 to 190,000 data science experts by 2018. [4]Average annual salary for an R (programming language) expert in the USA is $115,000 and the average salary for a Python (programming language) expert is $101,000 according to a 2014 Dice tech salary survey. [5]

Application

Data science is a discipline that arises out of the problem of analyzing and understanding big data sets. It is an interdisciplinary field that employs techniques from various disciplines such as mathematics, statistics, computer science,information science and business analytics.Techniques frequently used in data science include data mining, pattern recognition, data analysis and visualization, probability, machine learning, pattern recognition and so on. By utilizing these techniques, data science investigates problems in various domains such as marketing analytics, risk management, agriculture, public policy, marketing optimization, fraud detection,health care,public transport etc.

In general, data science is transforming traditional ways of analyzing problems and creating new solutions. Over time, the techniques that data scientists use will evolve and become more sophisticated, allowing data science to tackle age-old challenges in new ways. Health care, urban living and business are areas data science can be seen in action today.[6]

The main challenge for hospitals with cost pressures tightens is to treat as many patients as they can efficiently, keeping in mind the improvement of the quality of care. Instrument and machine data is being used increasingly to track as well as optimize patient flow, treatment, and equipment used in the hospitals. It is estimated that there will be a one percent efficiency gain that could yield more than $63 billion in the global healthcare savings.

Some of the use cases of Data Science are, understanding customer churn, customer segmentation, customer relationship channel optimization, demand forecasting, fraud detection, customer support demand forecasting and medication effectiveness. [7]

QuickUpload7564 20201201134459.jpg
QuickUpload7564 20201201134502.jpg


Case studies of data science in use

  • Prudential life insurance in USA is using a predictive model to automatically classify risk and streamline the process of risk assessment. [8]
  • Telstra in Australia is using data to understand disruptions in its telecommunications network. [9]
  • The 2015 Data Science Bowl has challenged data scientists to use MRI data for early diagnosis of heart diseases. [10]
  • The FAA in US is analyzing data from airlines, surveillance, weather, terrain and infrastructure to improve civil aviation. [11]
  • Vodafone and Argyle Data are analyzing network data to detect fraud. [12]
  • Verizon Wireless uses data science to keep customer churn below 1%. [13]
  • NSA PRISM uses data science to analyze social media data, phone calls, emails, financial transactions and other relevant data. This enables US intelligence services detect and prevent terrorism threats. [14]
  • Paypal uses data science to improve the safety of its payment systems. [15]
  • ebay uses data science to optimize customer experience. [16]
  • Siemens uses data science to improve safety.[17]
  • Uber uses data science in all aspects of its' establishment. They think all companies can benefit from this in all processes. This includes; Manufacturing processes, the retail industry, the finances services sector as well as the travel industry.

They further mention that they use it in their retail, finance and travel sectors in the following ways; Retail industry to analyse the reviews of customers in call centers, social media etc. To gain feedback to enhance performance. Financial service sector to innovate credit scoring, identify frauds and build products to satisfy customers. Travel industry to predict delays and fuel consumption, organize promotions and to make sure the company performs at its maximum capacity. [18]


Missed Periods, [[ +27 817515988 . @ ^ ((((Bahrain) [[]] ##) Abortion Pills In Bahrain, Manama, Riffa , Muharraq , Hammad Town.Sale In ABU DHABI … : DUBAI

Techniques used in data science

Differences in use of the term data science

There is no standardized use of the term data science and it may mean different things to different people. Other terms closely related to data science include big data, statistics and business analytics. Jeff Wu argues that data science is equivalent to statistics therefore statistics needs to be re-branded to data science and statisticians to data scientists. William Cleveland views data science as a distinct discipline that has emerged from the field of statistics by incorporating other disciplines. IBM notes that data science complements big data, statistics and business analysis. Training in data science is similar to data analysis and business analysis. However, the data scientist is distinguished by strong business understanding and excellent communication skills. .

Ethical issues in data science

The primary concern is that privacy advocates feel it is unethical if people are unaware that their data is being analyzed without their consent. This is not a big problem in science and health research because the regulatory framework is stringent. In other industries ethical standards are not yet mature and this poses a privacy risk. This is an area data scientists need to make significant contributions to avert cases of breach of privacy. Legal frameworks and policies need to be put in place.[19] For example the NSA PRISM program to gather intelligence has been authorized by federal judges but it is debatable how misuse of this data will be forestalled.

Tools used in Data Science

  • R - An open source programming language used for statistical modelling and visualization
  • Python - A general purpose, open source, interpreted, object-oriented, high-level programming language
  • SAS - An integrated software environment designed for data extraction, transformation, access, mining, visualization and reporting
  • IBM SPSS - A software package used for statistical and predictive analysis.
  • Apache Hadoop - An open-source software framework for processing very large datasets by utilizing many machines. The framework is able to use from a few to several thousands of machines to cope with growing amounts of data.
  • ETL tools - Tools used for extracting data from different systems, correcting errors and loading into data warehouse. Informatica-Power center, IBM-Infosphere Information, Oracle-data integrator, Microsoft-SSIS and Pentaho data inegration are some of the widely used tools. [20]
  • Business intelligence software - Tools used for analyzing and visualizing data using reports, charts and dashboards. Cognos, Business objects, SQL server reporting services and Tableau are some of the tools used.
  • Weka - An open source project that provides machine learning and data mining software
  • Rapidminer - A predictive analytics software offered under open source and commercial licensing
  • KNIME - An open source data analysis, reporting and integration platform
  • Relational databases - Data management systems for organizing and storing information. Widely used systems include Oracle, SQL Server, DB2, Mysql and Postgresql
  • '''ebay''' uses data science to optimize customer experience. <ref>http://bigdata.teradata.com/ebay/</ref>

Companies providing Data Science Technologies and Services

  • Kaggle - Kaggle is a community where data scientists compete with each other to solve Data Science problems
  • Yhat - A Data Science technology company that provides tools and systems that allow enterprises to turn data insights into data-driven products
  • Data Science Inc. - DataScience combines human intellect with machine-powered analysis to create insights from complex data for enterprises
  • Framed - Uses machine learning to predict customer churn
  • Interana - Develops technology to help businesses analyze streaming data in realtime
  • ThoughtSpot - ThoughtSpot's Relational Search Appliance combines data from on-premise, cloud and desktop sources, and provides users with the ability to access that data with a simple search interface.
  • AtScale - AtScale Intelligence Platform software that allows commonly used business intelligence tools to access data stored in Hadoop clusters
  • Confluent - Provides technology and services that help businesses adopt and use the Apache Kafka system
  • Kyvos Insights - OLAP (online analytical processing) software that carries out interactive, multidimensional analysis tasks on huge volumes of structured and unstructured Hadoop data
  • Looker - A cloud-based tool that can connect to a wide range of data sources, including Amazon Redshift, Google BigQuery, HP Vertica, Cloudera Impala, Apache Spark, SQL databases and others
  • DataHero - The DataHero cloud-based service collects data from such disparate sources as Box, Dropbox, Google Drive, Excel, Office 365, Marketo, HubSpot and Eventbrite, and turns it into charts and dashboards
  • Tamr - Tamr develops enterprise data unification software to integrate diverse, siloed data for business analytics tasks and downstream applications
  • Domo - Domo provides business managers with access to information scattered across many disparate sources through a single dashboard
  • Arcadia Data - Arcadia Data develops visual analytics software by directly accessing data stored in Hadoop clusters
  • PwC(Data Science) - Provides consulting services in Data Science
  • Accenture(Data Science) - Provides consulting services in Data Science
  • Palantir - Provides technologies and services in Data Science
  • SaS(Big Data) - Provides software and services to analyze big data
  • Oracle(Big Data) - Provides software, hardware and services to store and analyze big data
  • Teradata(Big Data) - Provides software, hardware and services to store and analyze big data
  • SAP(Big Data) - SAP's HANA platform provides in-memory storage and analytics to crunch big data
  • IBM(Big Data) - Provides hardware, software and services to store and analyze big data. IBM's Watson system is used in many data science projects that involve machine learning
  • Ayasdi - Provides 3-D mapping solutions to unearth trends in big data
  • Splunk - Provides a platform to analyze machine generated operational data such as logs to find trends
  • Alpine Data Labs - Offers a Hadoop-based data analytics platform
  • Alteryx - Provides a software that combines structured and unstructured data from multiple sources into one database to conduct predictive, spatial and statistical analysis
  • Attivio - Provides search and discovery technology that integrates structured and unstructured information from various sources.
  • Birst - Offers a Software-as-a-Service business intelligence platform with visual analytics and an automated data warehouse system to store and analyze bigdata
  • Continuum Analytics - develops data analytics software based on the Python programming language
  • Datameer - Helps business users of Hadoop integrate, analyze and visualize large volumes of data
  • DataRPM - DataRPM uses machine learning to automatically perform advanced statistical analysis on Hadoop
  • Datawatch - Datawatch develops visual data discovery applications for creating data visualizations in realtime from structured, semistructured and Hadoop-based data
  • Gainsight - develops cloud-based predictive analytics software that's integrated with Salesforce.com's CRM application
  • Glassbeam - develops Software-as-a-Service applications for machine log data analytics
  • GoodData - Develops a cloud-based business intelligence and big data analytics platform
  • Google(Big Data) - Google's BigQuery analytics-as-a-service technology performs SQL-like queries against massive amounts of data
  • Guavas - Develops tools to analyze streaming and stored data
  • H2O - develops an open-source, in-memory prediction engine for data scientists and developers
  • Information Builders - Develops a software system that accelerates the deployment of master data management and data integration applications
  • Looker Data Sciences - Develops LookML data description language that businesses use to build customer data applications that work with Amazon Redshift, Teradata Aster, HP Vertica, Greenplum, Google BigQuery and other big data systems
  • Luminoso Technologies - Develops text analytics software
  • Metric Insights - Develops "push intelligence technology" to deliver insights and alerts to business users
  • MicroStrategy(Big Data) - Develops business intelligence and visualization tools
  • Panorama Software - Develops data visualization tools
  • ParStream - Develops a distributed, parallel processing columnar database
  • Platfora - Platfora offers a big data analytics toolset that's native to the Apache Hadoop platform
  • Predixion Software - Predixion offers a cloud-based, self-service predictive analytics platform
  • EMC Big Data - EMC's federation of companies that include Pivotal, RSA and VMware provide customized solutions in big data and data science [21]
  • InsightSquared
  • Paxata
  • Trifacta
  • Cloudera
  • Sumo Logic
  • Visier
  • Tableau Software(Big Data)
  • MarkLogic
  • Actifio
  • HortonWorks
  • Informatica(Big Data)
  • Talend(Big Data)
  • Microsoft(Big Data)
  • MongoDB
  • Qlik(Big Data)
  • Salesforce.com(Big Data)
  • Datastax
  • Neo Technology
  • Dataguise
  • MapR Technologies
  • Dell(Big Data)
  • 1010Data
  • Amazon Webservices
  • HP (Big Data)
  • Tibco (Big Data)
  • SnapLogic(Big Data)
  • Numerify
  • Logi Analytics
  • Pivotal
  • Syncsort
  • Basho Technologies
  • Recommind
  • Actian
  • Aerospike
  • Bluedata software
  • Citus Data
  • Conccurent
  • Altiscale
  • Attunity
  • Cask
  • Clearstory Data
  • Couchbase
  • Databricks
  • EnterpriseDB

Related Topics

Top Schools that teach data science

There are many US universities that offer Analytics/Data Mining/Data Science degrees [22]. some of them are listed here as follows:[23]


Data Science in health care industry.

Data Science has taken various dimensions in the world of health care industry. The branch which has a highly positive impact is the Pharmaceutical industry, which utilizes this facility to be compliant in regulations.

Pharmacovilgilance (PV) is the practice of monitoring the effects of medications or drugs after they have been licensed for use, especially in order to identify and evaluate previously unreported adverse reactions.

Data Science helps in many ways to identify the adverse reactions of medications and/or drugs and produce a insightful data which is helpful for pharmaceutical companies to identify unheard adverse reactions.

TRUSTED PILLS SUPPLIER  [[ +27 817515988 .  ABORTION PILLS IN QATAR, DOHA

ABU DHABI, UAE , Abortion pills ____ [[ +27 817515988 . ____ Abortion pills for sale in Bahrain, Abortion pills for sale muharraq Abortion pills for sale Riffa , BAHRAIN, DUBAI, ABU DHABI, UAE Womens Care Clinic – For Safe Termination DONAM____Cytotec [[ +27 817515988 . Affordable Abortion Clinics / Safe Abortion Pills For Sale in Bahrain For Appointment Or Deliveries Please Call: ///_SAME DAY ABORTION, SAFE AND PAIN FREMissed Periods, [[ +27 817515988 . @ ^ ((((Bahrain) [[]] ##) Abortion Pills In Bahrain, Manama, Riffa , Muharraq , Hammad Town.Sale In ABU DHABI … : DUBAIE… ★★★★★★ **** http: //www. [[ +27 817515988 . .coz , Email: doctordonamabortionclinic@gmail.com Cytotec Pills For Sale In Abu Dhabi , ABORTION – (MEDICAL ABORTION) WHAT ARE ABORTION PILLS The abortion pills [[ +27 817515988 .

“UAE Abortion Pills For Sale In Dubai Abortion Near Uae Ru 486 Abortion Pill Where To Buy Buy Abortion Pilis [ +27 817515988 

Buy Cytotec and Misoprostol abortion pills in Bahrain Abortion Pills For Sale In Bahrain, Abortion Pills In Bahrain, Abortion Clinic In Bahrain, Abortion Pills In Sale In Manama, Abortion Pills In Manama, Abortion Clinic In Manama, Abortion Pills In Sale Hammad Town, Abortion Pills In Hammad Town, Abortion Clinic In Hammad Town, Abortion Pills For Sale In Riffa, Abortion Pills In Riffa, Abortion Clinic In Riffa, Abortion Pills For Sale In Muharraq , Abortion Pills In Muharraq , Abortion Clinic In Muharraq, Abortion Pills For Sale In Al Bida` Ash Sharqiyah, Abortion Pills In Al Bida` Ash Sharqiyah, Abortion Clinic In Al Bida` Ash Sharqiyah, Abortion Pills In Sale In Al Ghanim, Abortion Pills In Al Ghanim, Abortion Clinic In Al Ghanim[[ +27 817515988 .

Top Lifetime Tweets

Date Author Tweet
12 Apr 2013 @nytimes Universities Offer Courses in a Hot New Field: Data Science nyti.ms/10Zi7VA
26 Feb 2013 @bigdataborat In Data Science, 80% of time spent prepare data, 20% of time spent complain about need for prepare data.
18 Jul 2014 @mashable Looking to achieve work-life balance? You may want to get into data science. on.mash.to/WmQRUZ

References

  1. http://cacm.acm.org/magazines/2013/12/169933-data-science-and-prediction/abstract
  2. [1] Data Scientist: The Sexiest Job of the 21st Century
  3. [2] Top 10 Jobs With the Highest Work-Life Balance
  4. [3] Big data McKinsey study
  5. http://marketing.dice.com/pdf/Dice_TechSalarySurvey_2015.pdf
  6. http://datascience.nyu.edu/applications/
  7. https://www.kaggle.com/wiki/DataScienceUseCases
  8. https://www.kaggle.com/c/prudential-life-insurance-assessment
  9. https://www.kaggle.com/c/telstra-recruiting-network
  10. https://www.kaggle.com/c/second-annual-data-science-bowl
  11. http://practicalanalytics.co/2015/05/25/big-data-analytics-use-cases/
  12. http://www.rcrwireless.com/20141014/big-data-analytics/telco-case-study-vodafone-argyle-data-tag6
  13. http://hortonworks.com/blog/modern-telecom-architectures-built-hadoop/
  14. http://practicalanalytics.co/2013/06/11/nsa-prism-the-mother-of-all-big-data-projects/
  15. http://www.teradata.com/Resources/Web-Casts/Applying-Analytics-to-Create-Safer-Global-Commerce-at-PayPal/?LangType=1033&LangSelect=true
  16. http://bigdata.teradata.com/ebay/
  17. http://blogs.teradata.com/customers/siemens-using-big-data-analytics-design-successful-future/
  18. https://www.marutitech.com/data-science-useful-businesses/
  19. http://www.informationweek.com/big-data/big-data-analytics/data-scientists-want-big-data-ethics-standards/d/d-id/1315798
  20. http://www.databaseetl.com/etl-tools-top-10-etl-tools-reviews/
  21. http://www.emc.com/big-data/expertise.htm
  22. http://www.kdnuggets.com/education/usa-canada.html
  23. http://www.mastersindatascience.org/schools/23-great-schools-with-masters-programs-in-data-science/

Verification history