El poder de la analítica de Datos
Analytics ha dado forma a la adquisición de los datos para convertirlos en al foco del mercado. Sin embargo, seguimos rezagados en términos de resultados.
As one of the fastest growing innovation centers in the world, companies and entrepreneurs around the globe are seeking to invest in the innovative APAC region. With the region’s market growth sitting around four percent and an ever-increasing culture of improvement in software and services, companies are increasing their innovation spend to keep up with competitors. Because of this growing spirit of innovation, Singapore, India and China are set to experience significant boosts in growth, with Japan and Australia set to grow half as fast. Increasing thought leadership, continuing investment in R&D and a large talent base makes APAC a prime area for innovative technology. As far as opportunities go, Singapore is a vibrant, innovative center of growth with strong financial services, stability, market access and a straightforward government for business. It is known for its diverse workforce and multi-lingual business support.
Read the report here.
Organizations have always used data to guide business decisions, but with the arrival of the internet, increasingly powerful analytics tools have provided an opportunity to understand more about the sectors in which they operate. Today, companies are sitting on terabytes of data that cover a wide spectrum of important factors, ranging from customer behavior and market trends to raw information about future developments.
The potential value of this data is so great that analytics itself has become big business. It has created a new industry for number crunching and caused a major shift in IT-related jobs. But even in this data-fueled environment, a big question remains: are organizations making the most of the information at their fingertips?
Dr. Catarina Sismeiro is an associate professor at Imperial College Business School where she teaches the Executive MBA program as well as the Business Analytics and Strategic Marketing master programs. Dr. Sismeiro argues that many businesses have adopted a twin-speed approach to analytics. Complex algorithms are used to boost operational efficiency, cut costs, and track customer behavior, but only a select few are using what they learn to drive strategic direction.
“The main issues are a lack of data-centric culture, not enough willingness to rely on algorithms or data analytics for strategic insights, and the absence of a strategic plan for data-driven insights, especially at the top level,” she explains. “Although the evolution for a data-centric approach at operational levels started long ago, pushed by the need to improve efficiency due to fierce competition, changes at the top strategic level have been slower.”
Is the Board on board?
There is a long list of brilliant data implementations, such as yield management systems pioneered by American Airlines, which Dr. Sismeiro says led to an immediate uptick in profitability for the company. All major car rental companies, hotels, and airlines now use similar systems to cut waste and create efficiencies.
While this is taking place on the ground, C-suite executives are perhaps proving less adept at making the change. The key is for senior staff to be able to derive action points. It’s a process that moves from raw data to information, to insight, to action, and ultimately to business impact.
The transition to data is also happening organically, as a growing number of digital natives are promoted up the chain of command. It is telling that large dotcoms jam-packed with millennial talent—like Facebook, Amazon, and Google—are leading the charge. But it will also take an organizational reshuffle at the top, with greater coordination between CMOs, CIOs, and CTOs. And with new rules affecting data gathering about to hit the EU, there will also be a greater collaborative role for Chief Privacy Officers (CPOs), and Chief Data Officers (CDOs).
David Morgan, human resources director, EMEA at Kronos agrees: “The more transparency you have, the more likely the business is able to drive better planning, forecasting and employee utilization and engagement. Good decision-making based on what is happening in marketing and data supports improved prioritization based on demand, need, and where to spend.”
Data mining means delving deeper
For growing businesses lacking the large IT budgets of global corporations, the challenge is to deliver relevant insights from the data at hand. Often, this means filtering out the information and refining it to a granular level.
“Data cleanse is crucial before acting on a poor metric, particularly if you're looking to spend a fortune on A/B testing or CRO audits. Simple things like filtering data for your target region can go a long way to making KPIs healthier,” says Marc Swann, search director at the digital agency Glass Digital. “There's a danger in acting on top-level metrics. Dig deeper to see if you can get more insight into exactly what's going on, and you're more likely to make the smartest decision. Segmenting data by page type might show that your product pages are doing well, but your help guides are dragging the average down because users visiting these pages don’t have purchase intent. This context will direct your efforts into retargeting campaigns and better calls to action on blog posts, rather than funneling unnecessary investment into product pages.”
Making data work harder
Growing businesses should consider sweating existing data before splashing out on new insights. According to Dr. Sismeiro, internal data is often surprisingly rich and it’s free. In many cases, companies just need to organize, integrate, and store it for easy access.
For organizations wanting to get more from their data, Dr. Sismeiro has further advice—don’t underestimate the important contribution employees can make. She recommends empowering all employees, not just analysts and data scientists, to use data and extract strategic insights.
Organizations should remove “data silos” and work harder to integrate online and offline data to make it easier to draw out insights that are relevant to the business as a whole. This can be done quickly in small projects that when combined will have a big impact. It’s better to act this way than to spend a lot of time bringing everything together all at once.
Investing in systems is, of course, an important part of updating a data strategy. Legacy systems might not be capable of creating the new holistic approach. But this need not be a big up-front investment, due in large part to the affordability of cloud applications and Software-as-a-Service (SaaS).
Finally, it’s important to share data successes. If teams understand the power of data and the results it can generate, they are more likely to respond and buy in to organizational goals. Bigger companies should consider creating a dedicated team to help instill an “insights culture” and promote the progress being made in the business as a whole.
With the right approach, all organizations can benefit from analytics. According to Dr. Sismeiro, businesses everywhere are waking up to data’s full potential. “We can now find many examples of businesses that use analytics to discover and assess new business opportunities, find what customers say about the firm or products, unveil market segments and make product recommendations, aid decision making, improve logistics and increase efficiency, influence voters, and even control manufacturing and protect crops.”
Organizations that embrace the power of analytics will drive ahead of their rivals in the data-driven business environment. Those that fail to do so will be forced to take a back seat and watch as their rivals pass them by.
Too often we gather the data, doing the mathematics—the analytics, and then assume that actions will happen. But generally that’s not the case. Our approach needs to fundamentally change so that we carry through to the point of action. Hear what Steve Jones, Global VP- Big Data, Capgemini has to say.
The new consumer expectation is a perfectly tailored experience, and consumers want it now, as long as it is relevant and delivered at the right time.
Much is written about the pace of change in our lives, the shift in power to the people, the time we devote to devices and social channels...
Marketers must stretch resources across tens of channels and devices, identifying micro triggers and groups that used to be targeted en masse. A new level of targeting is needed, and level of team enablement to democratize analytics more deeply in marketing.
But also, new data presents an unprecedented opportunity to learn and build intimacy with customers. Mastering insight at a new level is becoming a competitive imperative to be built into your DNA.
We examine how organizations and marketers can leverage this immense trail of data to offer customer value. We examine how organizations and marketers can leverage this immense trail of data to offer customer value. Read our POV.
How are you managing your customers’ expectations? Let’s discuss. @ruurddam
Still remember our rogue kickoff for 2016 with seven key trends in the world of BI, analytics and AI? Pretty accurate they were, no? Anyway, much has changed in just a year and we have much to look forward to in 2017, with data being the undisputed driver of any serious business change effort. Just like last time, we have carried out a quick survey among our global Advisory & Architecture community to identify the hottest topics in insights & data - the ones that we feel will have a real impact on businesses.
It’s a motley crew of trends here, as you'll see, with major themes such as automation, enterprise-scalability, cloudification and (surprise) the rise of Machine Intelligence making it to the top of the pile.
So here’s what you need to know for 2017:
1. BI Curious
The recent wave of technology innovations – driven by Big Data – is now starting to enter the mainstream of ‘plain’ BI as well. The world of reporting & dashboards and descriptive & diagnostic analytics is increasingly benefiting from a powerful mix of cloud, open source solutions, self-service platforms, advanced visualization, collaboration tools, automation, cognitive interfaces and AI-assistance. A pretty compelling picture to explore with convincing cost benefits and productivity, as well as improved agility and effectiveness. It will give the seemingly established BI landscape a radically modernized, vanguard face, enchanting business users and solution developers alike.
2. The Empire Strikes Back
As often is the case, the disruptive innovations in data are coming from the open source and start-up communities. And it has taken the major industry players arguably way too long to catch up. But rebound time, it is now. Technology leaders such as Microsoft (look at how they embrace the open source ecosystem), SAP (bridging HANA and Big Data platforms, acquired their own Hadoop-as-a-service provider), IBM (yes, there is life outside Watson) and SAS (going cloud and Big Data with their new Viya platform) are on full speed.
And the interesting thing? They are merging their enterprise-scale, high-productivity tools with innovative, new technologies. The best of both worlds. In any galaxy, really.
3. Ceci n’est pas un Éléphant
Don’t know what an elephant has to do with Big Data? Go work on your street cred. For all the others, it’s good to realize that nowadays the Hadoop ecosystem is no longer just about – well – Hadoop. Although arguably the entire Big Data revolution started with the abilities of the Hadoop Distributed File System (HDFS) to store and provide access to huge amounts of data in any structure or unstructure, it’s now the powerful set of analytical tools on top of it – such as Spark, Storm and HBase – that really provide the new value. Expect your shiny Hadoop servers therefore to disappear sooner or later. First into the cloud, provided as a scalable, on-demand service, and ultimately being replaced by other – even more effective – storage and access services.
4. At Your Service
One does not just become insight-driven by hiring data scientists and data engineers. Every business person should become a bit of a data expert, perhaps even a ‘citizen’ data scientist. The best insights are created in near proximity to the business and for that, data must be discovered, prepared, analyzed and visualized by the business. It requires a highly automated data ‘pipeline’ that gives agile access to the right data – all the way from its ingestion, while ensuring security, privacy and enterprise quality. It also requires easy-to-use, self-service tools that power the business to take matters into their own hands and work together to become insight-driven.
It will certainly also depend on an increasing level of Machine Intelligence, to help business users to identify and prepare exactly the right assets in their corporate data lakes (watch Informatica for this with their Live Data Map). Expect to hear more of the concept of the Data Concierge, as your intelligent 'one-stop shop' for data.
5. You Do The Math
We know, we know. Data science is not necessarily about math. But it sure sounds good as a trend headline, doesn't it? And although the BI people rightfully are claiming their part of the new data landscape, the future is definitely in algorithms. Algorithms that help make much better-informed decisions, predict what will happen and even prescribe what should be done to achieve objectives in an end-to-end business. An eclectic catalogue of algorithms could be the most differentiating business asset, whether pertaining to the customer experience, internal operations, human resources, risk, fraud or physical assets (IoT almost equals algorithms).
It requires understanding the full analytics lifecycle, building the new skills that are needed and gradually establishing a true analytics culture across the organization. In practice, a combination between a top-down vision of what the enterprise wants to achieve through algorithms, and a hands-on, applied way of getting busy with early applications and results will work best.
And there is a quickly growing market of sector and domain algorithms out there as well. Algorithms that are ready to be used, right out of the box. So you don’t need to science your way out of this all on your own.
6. The P&L of Trust
Particularly when you are in Europe, the upcoming General Data Protection Regulation (GDPR) is likely to have a major impact on your insights & data plans for 2017. If only to comply by the May 28, 2018 as any fine may be up to 4% of your organization’s yearly revenue. It’s what some might consider a valid business case. A lot of work may need to be done, in terms of securing both the privacy of individuals and the security of their personal data. It may involve new technologies, new roles, new organizational structures and a thorough overhaul of the existing data landscape.
Seems like a nuisance, admitted.
But on the bright side, companies that do it well have the unique opportunity to reach out to their customers, be transparent and conversational about the use of their personal data and – ultimately – use trustworthiness as a competitive differentiator. From that perspective, any strategic decision in the data context should be considered in terms of its impact on the bottom line of trust. If we’d been completely without inspiration, we might have called this crucial trend ‘Trust Is The New Oil’ (but of course we didn’t).
7. Max Machina
Breakthroughs in deep learning and raw computing power are fueling the renaissance of AI and Machine Intelligence (there is Machine Intelligence 3.0 now; guess we had that coming). And it’s so much more than yet another drop in the ocean of hypes. Of course, you can go for the Full Watson and aim for a drastic overhaul of your business model. But this will take time and a relentless focus. In the meantime, you may want to consider exploring conversational technologies (such as Facebook’s Messenger platform or the Microsoft Bot framework), embed AI in your business applications (Salesforce would agree), apply cognitive technologies to your unstructured text (for example to understand what is in a complex contract with RAVN) and even use it to optimize your own IT processes (smart automation won’t be able to do without). You know, the latter if only as a way of drinking your own champagne.
What are the trends you are witnessing? Look me up at @rtolido
Data scientists pride themselves on knowing every programming language under the sun, every library available to man and the ability to work in only code like the Zion operators from the Matrix. They are also control freaks who like to tweak and fine tune their models like F1 racing cars. So, with this in mind, how do “Drag and Drop” tools fit within the Data Scientists toolbox?
With the recent developments in “self-service” technologies servicing a majority of tasks that a Data Scientist will typically encounter, there is much on offer that can significantly speed up how a Data Scientist “wrangles” data, experiments with models and share insight. Combine this with the ease of deploying some of these “self-service” technologies and the ability to remotely access such services over a web-based interface, significant user exposure and efficiency gains are potentially within “easy” reach.
A number of vendors providing “self-service” drag and drop tools / services have partnered with public cloud service providers such as Azure and AWS. In addition, some cloud service providers offer their own pay-as-you-go Software as-a-Service options adding to the flexibility of using a range of different tools whilst being in control of costs (i.e. turning on and off services) and compute resources.
Anyway, back to the question at hand, how do these fit into a Data Scientists tool box? Well that depends on the type of Data Scientist you are, whether you are a seasoned pro or a budding novice and anything in between, and the type of task you are exploring. Below are a few examples of where these “self-service” drag and drop solutions could fit in and help drive efficiency.
Data “Wrangling” and / or preparation
A major drain on a data scientists’ time is that spent on preparing data - from re-formatting fields to mapping datasets and conducting data quality checks. Traditionalists will generally dive right into programming these tasks in languages such as Python and feed the scripts into an End-to-End (E2E) automated workflow. The trouble with doing this programmatically is that coding errors can be made that consume time to debug, data processing / transformation steps are less transparent and implementing fundamental changes increases the risk of code re-writes (amongst other potential issues).
In addition, programmatically you only see the data in its raw format which does not lend easily to spotting data quality errors being able to quickly identify how the data is distributed without running further code to either visualize the data or to automate the identification and logging / alerting of potential errors / outliers. This additional code also brings along the potential issue mentioned above.
With many of the “self-service” tools geared for Data Wrangling (Trifacta, Dataiku to name a couple), they automatically allow you to visualize a sample set of the data to provide an early indication of data quality (i.e. how complete the data is), auto assign data types to fields where schemas are not available and provide quick views of how the data is distributed. This is all without writing a single line of code speeding up the time to start informing decisions around how to handle the data or what approach to best apply to the data to achieve the desired outcomes.
They also provide “drag and drop” functionality to implement typical transform tasks such as joining, aggregating, text manipulation and many others. For more complex data, they also offer some clever algorithms to detect how the data should be structured and automatically suggest transformations. Trifacta even employs machine learning to better improve its suggestions.
The development of the steps and process to prepare the data is recorded as a visual workflow, which provides a nice and easy visual way to check that the steps are logically ordered and to quickly get an overview of what is happening to the data. This can make identifying where problem manipulations occur that little bit quicker.
For complex data wrangling tasks, being able to break down the solution into visible chunks can make the development process slicker and make these tasks less daunting for those developing their Data Science skills.
This “workflow” will then auto generate SQL code specific to the backend database connection selected. Depending one the chosen solution, this code can be run in the solution itself or exported (with auto generated annotations / comments) and used to orchestrate within a complex environment utilising other tools such as ActiveEON etc.
Rapid Model Prototyping and Selection
Within some solutions (Dataiku and Azure Machine Learning for example) the data flow can be extended to feed into various predefined machine learning engines without any need to write a single line of code. What this allows for is the ability to quickly test out various algorithms, compare the performance in different engines and languages (for both accuracy and speed) helping to hone into the best-suited approach to different problems.
Again, the auto generated code can be exported and / or amended / customized to fine tune the models performance or tailor it to a specific solution. This also provides a nice starting point for budding data scientists or those looking to implement new methods not previously implemented.
For the traditionalists who like to write customized models, customized code written in a variety of programming languages can be imported and included in existing workflows making the process of switching between out of the box and custom models a simpler affair. This helps promote and de-risk the ability to experiment with different techniques and support rapidly prototyping different solutions without the need for significant re-engineering of code and workflow.
Whilst creating the workflow, you are also inherently orchestrating the tasks taking away the need to do this when it comes to putting your solution into production. As previously mentioned, a nicely formatted and commented export of the code for your entire workflow can be exported and included as part of an orchestration tool such as ActiveEON, or the code taken apart if there is a need to orchestrated each element separately.
Data and Insight Driven Visualizations
Ok so you now have a way of classifying your data or have a nice set of insightful data. How do you go about surfacing this in a meaningful way for decision makers? One of the quickest ways to get across the message from your insights is through user driven visualisations. Writing lines of code to correctly visualise ad-hoc charts and plots in R or Python comes with its challenges and ultimately consumes time.
This is where self-service data visualization / visual analytic tools come into play (e.g. Power BI, Tableau, Qlik Sense etc.). These tools can connect to nearly all mainstream data sources or any data sources that are ODBC / JDBC compatible and quickly visualize the data through a “drag and drop” interface. A majority have server or cloud hosted versions that you can easily publish to and share your creations accessed via web browsers. These tools make chopping and changing between chart types, formats etc. a breeze making it a simple task to cycle through a range of typical charts and plot types. In addition, they inherently create interactive visualizations that can be all connected to quickly enable end users to drill down into the specifics they are interested in.
With these tools, prototypes can be developed rapidly that in turn support the de-risking, guidance and refinement of user requirements on how best to communicate visually to the business the outputs from all the clever backend data cleansing, wrangling and modeling elements.
So has the age of “drag and drop”, “code free”, “self-serve” data science arrived? Will data scientists never need to touch or learn Python again? Did the UK vote to leave the EU and will Trump be voted in as President?
Well the latter two came true so why not the first two!
Fear not, fellow data scientists, whilst these new tools offer significant time saving and efficiency benefits, they will never replace the toolsets of true data scientists – only complement them.
However they do offer a nice and attractive way into data science for those who want to tread down that path. For seasoned Data Scientists they provide a capability to rapidly prototype approaches to data science-related problems. They also add a bit more transparency to data science projects by laying out the end-to-end process in a clear and consistent way exposing how each step feeds into other steps (that is assuming there is a transparent process and method to the madness!).