big data, data science, arti cial intelligence and digital...
TRANSCRIPT
Big Data, Data Science, Artificial Intelligence andDigital Transformation:Is there a Shangri La?
Wagner Meira Jr., PhD 1
1Department of Computer ScienceUniversidade Federal de Minas Gerais, Belo Horizonte, Brazil
February 17, 2020
Meira Jr. (UFMG) Is there a Shangri La? February 17, 2020 1 / 39
What is Shangri La?
Shangri-La is a fictional place described in the 1933 novel Lost Horizon by Britishauthor James Hilton. Shangri-La has become synonymous with any earthly
paradise – a permanently happy land, isolated from the world.
Meira Jr. (UFMG) Is there a Shangri La? February 17, 2020 2 / 39
Big Data?
Meira Jr. (UFMG) Is there a Shangri La? February 17, 2020 3 / 39
Big Data: Is it a solved problem?
IoT?
Real time?
Heterogeneity?
Data Protection and Privacy Assurance?
Meira Jr. (UFMG) Is there a Shangri La? February 17, 2020 4 / 39
Data Science
Data Science is an area that aims to systematize processes and practices toexplore, analyze and generate models that enable description, prediction andprescription based on diverse data. Overall, it targets better performance andefficacy of organizations and life quality of both citizens and societies.Data Science models and transforms data towards supporting decision making,through computational thinking tasks.
Meira Jr. (UFMG) Is there a Shangri La? February 17, 2020 5 / 39
Data Science Process
Meira Jr. (UFMG) Is there a Shangri La? February 17, 2020 6 / 39
Data Science Areas
Meira Jr. (UFMG) Is there a Shangri La? February 17, 2020 7 / 39
Data ScienceComplexities
Complexity refers to sophisticated characteristics in data science systems.
Data science problems may be viewed as complex systems involvingcomprehensive system complexities.
Data complexity
Behavior complexity
Domain complexity
Social complexity
Environment complexity
Learning complexity
Deliverable complexity
Meira Jr. (UFMG) Is there a Shangri La? February 17, 2020 8 / 39
Data Science Jobs
Data scientists are responsible for the handling of raw data, analyzing it with thehelp of various techniques and presenting insights in a manner that are useful forpredicting business problems. A Data Scientist uses Machine Learning and alsopredicts the future based on past patterns. The average salary range (US) for aData scientist is $119,000.
Data analyst is the one who analyses data. But this process requires creatingsystems that help users of business to draw out insights and ensure data quality.His role is to collect, process, and perform statistical data analyses. Data Analystfinds meaningful information from available data and uses R or SAS. Not just ITindustries, but all kinds of companies in the industries i.e. healthcare, automobile,finance, retail, and insurance need Data Analysts to run their business. Theaverage annual salary (US) for Data analysts is $62,000.
Meira Jr. (UFMG) Is there a Shangri La? February 17, 2020 9 / 39
Data Science Jobs
The role of Data Architect is to create data management systems to integrate,protect and maintain data sources and company’s information. He is responsiblefor database architecture, design, creation and optimization of data. Technologieslike Pig, Spark, SQL, XML, and Hive are required to be mastered by dataarchitects. The average annual salary (US) for this career is $100,000.
Data Engineers are not the ones who analyze data but builds a certain softwareinfrastructure for other professionals to do the work. They are able to do this asthey have an in-depth knowledge of Hadoop and Big Data technologies such asMapReduce, Hive, and Pig, NoSQL technologies, SQL technologies. His role is todevelop, test and maintain large scale processing systems. More than 50 percentof the work is Data Wrangling, where data engineers excel who has a backgroundin software engineering. The average salary (US) for this job is $95,000.
Meira Jr. (UFMG) Is there a Shangri La? February 17, 2020 10 / 39
Artificial Intelligence
Artificial intelligence (AI), sometimes called machine intelligence, is intelligencedemonstrated by machines, in contrast to the natural intelligence displayed byhumans. Colloquially, the term ”artificial intelligence” is often used to describemachines (or computers) that mimic ”cognitive” functions that humans associatewith the human mind, such as ”learning” and ”problem solving”.
Analytical AI has only characteristics consistent with cognitive intelligence;generating a cognitive representation of the world and using learning basedon past experience to inform future decisions.
Human-inspired AI has elements from cognitive and emotional intelligence;understanding human emotions, in addition to cognitive elements, andconsidering them in their decision making.
Humanized AI shows characteristics of all types of competencies (i.e.,cognitive, emotional, and social intelligence), is able to be self-conscious andis self-aware in interactions.
Meira Jr. (UFMG) Is there a Shangri La? February 17, 2020 11 / 39
Impact on society?
Meira Jr. (UFMG) Is there a Shangri La? February 17, 2020 12 / 39
Digital Transformation
Digital transformation (DX) is the reworking of the products, processes andstrategies within an organization by leveraging current technologies.Common challenges:
Scale: How can an established organization that operates on an analogbusiness model fundamentally change the way it identifies, develops, andlaunches new ventures without losing effectiveness?
Talent: How can organizations that desire digital transformation train,retain, and attract the most talented individuals to change their organizationwithout uprooting or losing sight of collaborators that made them greatcompanies in the first place?
Metrics: How do newly digital organizations measure their successes andfailures in comparison to their formerly analog selves?
Meira Jr. (UFMG) Is there a Shangri La? February 17, 2020 13 / 39
Digital Transformation
Go Digital: It is a matter of infrastructure and investment.
Be Digital: It depends on culture and practices changes and much moreinvestment.
What’s the slope of your organization?
Meira Jr. (UFMG) Is there a Shangri La? February 17, 2020 14 / 39
Digital Transformation
Meira Jr. (UFMG) Is there a Shangri La? February 17, 2020 15 / 39
Analytics across time
Data
Report
Statistical analysis
Descriptive models
Predictive models
Human-in-the-loop models
Meira Jr. (UFMG) Is there a Shangri La? February 17, 2020 16 / 39
Analytics across timeNavigation information
Data: vehicle location
Report: location history
Statistical analysis: probability distribution of route duration
Descriptive models: segmentation of route duration per time period
Predictive models: estimate time duration considering conditions
Human-in-the-loop models: route adaptation assisted by app.
Meira Jr. (UFMG) Is there a Shangri La? February 17, 2020 17 / 39
Data Science Issues
Data
Models and techniques
Technonology
Skills and culture
Meira Jr. (UFMG) Is there a Shangri La? February 17, 2020 18 / 39
*Data*
Meira Jr. (UFMG) Is there a Shangri La? February 17, 2020 19 / 39
Data Science Effort Distribution
Meira Jr. (UFMG) Is there a Shangri La? February 17, 2020 20 / 39
Top Data Science Methods
Meira Jr. (UFMG) Is there a Shangri La? February 17, 2020 21 / 39
Models and Techniques
Meira Jr. (UFMG) Is there a Shangri La? February 17, 2020 22 / 39
Top Analytics Software
Meira Jr. (UFMG) Is there a Shangri La? February 17, 2020 23 / 39
Technology Landscape
Meira Jr. (UFMG) Is there a Shangri La? February 17, 2020 24 / 39
Data ScienceSkills
Computational thinking
Analytical ability
Quantitative ability
Algorithmical ability
Computational literacy
Meira Jr. (UFMG) Is there a Shangri La? February 17, 2020 25 / 39
Data Science Ecosystem
Leaders: understand the potential of DS and create the conditions for itsdevelopment.
Data Scientists: Design and implement models, methods and techniquesthat are data intensive.
Translator: Identify opportunities and promote the matching betweendemands and resources available.
End user: Collaborators that will be empowered by Data Science.
Meira Jr. (UFMG) Is there a Shangri La? February 17, 2020 26 / 39
Road to Data Science
Meira Jr. (UFMG) Is there a Shangri La? February 17, 2020 27 / 39
Are we done?
Meira Jr. (UFMG) Is there a Shangri La? February 17, 2020 28 / 39
FATES
Fairness
Accountability
Transparency
Ethics
Safety and Security
Meira Jr. (UFMG) Is there a Shangri La? February 17, 2020 29 / 39
Fairness
Fairness means that the models we build are used to make unbiased decisions(e.g., classifications) or predictions.
Defining fairness formally is an active area of research, of interest to computerscientists, social scientists, and legal scholars.
Example: Propublica study shows that a machine learning model, used by courtsin the US, to predict recidivism is biased against blacks over whites. This studyled academics to show the impossibility of satisfying two different, but reasonablenotions of fairness simultaneously.
Meira Jr. (UFMG) Is there a Shangri La? February 17, 2020 30 / 39
Accountability
Accountability means to determine and assign responsibility – to someone orsomething – for a judgment made by a machine. Assigning responsibility can beelusive because there are people, processes, and organizations as well asalgorithms, models, and data behind any judgment.
Meira Jr. (UFMG) Is there a Shangri La? February 17, 2020 31 / 39
Accountability
Example: Google ads for high-paying jobs were shown more to men than towomen, it’s not at all clear whom or what to blame. Who is responsible for thediscrimination results? We can think of a few reasons why the discriminationresults may have appeared:
The advertiser’s targeting of the ad
Google explicitly programming the system to show the ad less often tofemales
Males and female consumers respond differently to ads and Google’stargeting algorithm responds to the difference (e.g., Google learned thatmales are more likely to click on this ad than females are)
More competition existing for advertising to females causing the advertiser towin fewer ad slots for females
Some third party (e.g., a hacker) manipulating the ad ecosystem
Some other reason we haven’t thought of.
Some combination of the above.
Meira Jr. (UFMG) Is there a Shangri La? February 17, 2020 32 / 39
Transparency
Transparency means being open and clear to the end user about how an outcome,e.g., a classification, a decision, or a prediction, is made. Transparency can enableaccountability.
The massive amounts of data collected by third parties about our behavior meansthere is more information that others have about us than we have about ourselves.This lack of transparency between data collectors and data underlies the “inverseprivacy” problem: the inaccessibility of data collected by others about us
Meira Jr. (UFMG) Is there a Shangri La? February 17, 2020 33 / 39
Transparency
The EU GDPR’s “right to explanation” calls for transparency of data-drivenautomated decision-making (from Article 13, paragraph 2):
2. In addition to the information referred to in paragraph 1, the controller shall, atthe time when personal data are obtained, provide the data subject with thefollowing further information necessary to ensure fair and transparent processing:. . .(f) the existence of automated decision-making, including profiling, referred to inArticle 22(1) and (4) and, at least in those cases, meaningful information aboutthe logic involved, as well as the significance and the envisaged consequences ofsuch processing for the data subject.
“Meaningful information about the logic involved” suggests that some kind ofjustification is required by data collectors to provide data subjects.
Meira Jr. (UFMG) Is there a Shangri La? February 17, 2020 34 / 39
Explainable AI
Meira Jr. (UFMG) Is there a Shangri La? February 17, 2020 35 / 39
Ethics
Ethics for data science means paying attention to both the ethical andprivacy-preserving use and collection of data as well as the ethical decisions thatthe automated systems we build will make.
1 the ethical issues relate to fairness, accountability, and transparency withrespect to the data collected about individuals and organizations. What dataneeds to be collected and for what purposes are the data intended to beused? How transparent to the end user are these policies?
2 machines will be programmed to make ethical decisions, some of which haveno right or wrong answer. The canonical “Trolley Car Problem” raises theethical question of whether it is better to kill one person or five. The ethicaldilemma is that there is no right answer to this question.
Meira Jr. (UFMG) Is there a Shangri La? February 17, 2020 36 / 39
Safety and Security
Safety and security means ensuring that the systems we build are safe (do noharm) and secure (guard against malicious behavior).
If we cannot ensure their safety, then consumers will not trust them. Onelongstanding technical challenge is to verify the safety of a digital controllerinteracting with a physical environment in the presence of uncertainty. One mustreason about a combinatorial number of possible events and many relevanthigh-dimensional variables.
A new technical challenge is to verify AI systems trained on big data, e.g., a smartcar’s cameras use computer vision models trained on DNNs. Examples of incorrectbehaviors are self-driving cars crashing into guardrails. This dimension alignsnicely with the need for accountability and transparency of machine-learningalgorithms and models.
Meira Jr. (UFMG) Is there a Shangri La? February 17, 2020 37 / 39
Safety and Security
Data science raises new security vulnerabilities.
Not only do we need to protect our network, our computers, our devices, and oursoftware, but now we need to protect our data and our machine learningalgorithms and models.
Attackers can tamper with the data, thus producing a model that makes wrongdecisions or predictions. The field of adversarial machine learning studies howmalicious actors can manipulate training and test data and attack machinelearning algorithms. The distinctive context here is that algorithms are working inan environment that adapts and learns from the system’s behavior to wreakhavoc. Whereas for safety, our trained systems need to work in unpredictableenvironments, for security they work in adversarial ones.
Meira Jr. (UFMG) Is there a Shangri La? February 17, 2020 38 / 39
Conclusions
Shangri La is not just a technical and/or technological issue
Analytics will happen anyway.
There is no single solution for all demands.
One professional profile does not fulfill all demands
Technology keeps advancing fast, despite some clear definitions.
The relevance of algorithms also comes with responsibilities.
Making algorithms compatible with ethics and legal requirements may be hard
Research and development opportunities in all levels.
Optimistic view about CS and its impact on society. Another opportunity!
Meira Jr. (UFMG) Is there a Shangri La? February 17, 2020 39 / 39