At a recent DEVintersection conference in Las Vegas, our own Markus Egger sat down with his fellow Regional Director Ciprian Jichici to talk about machine learning, AI, cyber security and malicious attacks, quantum computing, and much more.
Not much time has gone by since that talk with Ciprian, yet much has changed. A few days after discussing cyber security, CODE Magazine was the victim of a major cyber-attack (something we’ll talk about more in upcoming issues of this magazine). Many of our readers will have perhaps noticed some interruptions in our business and the ability to get all magazines out in time. This incident made it all the clearer that security was a very important and timely topic. The topic of Quantum Computing also ended up being interesting for an exciting reason. Markus wasn't able to talk about it at the time, but FX on Hulu's new television series “Devs” revolves exactly around this topic, and CODE Magazine was involved (see our March/April 2020 issue). Then, in far more sobering news, the SARS-CoV-2 virus epidemic hit the world. Topics touched on in this conversation, such as facial identification, have a direct connection to some countries being able to handle the spread of the epidemic better than others. The virus also ended up cancelling many events (or forcing them to be online-only), including the Microsoft MVP and RD Summit in Redmond, thus making it all the rarer for RDs to be able to meet in person. Nevertheless, this column will continue, as we will find for a way to get RDs to talk and for us to listen in. For now, let's listen in on what was discussed.
Markus Egger: Ciprian, good to see you! Nice to get together every so often.
Ciprian Jichici: Yes, good to see you too!
Markus: You’re in a super interesting field. We see each other at the MVP and RD Summit and occasions like that, but don't have that much opportunity to talk other than online, which is why I’m excited that we get to sit down together here in Vegas. You deal with AI, machine learning, data science, and cyber security, and you have a big passion for those fields. We're here at DEVintersection in Las Vegas where we are both presenters. Tell me a little bit about your key areas of interest.
Ciprian: These days I do a lot of work in the field of practical machine learning. I also do a lot of work in data science-related projects, mostly, data cleansing, data engineering, data enrichment, and things like that. I feel so fortunate about the place where I am, from a professional point of view, because this has been my lifelong hobby. And 20 years after I started my career, I'm at a point where I actually do most of my productive work in the field of either machine learning, data science, or data engineering.
And, as much as it’s such a great place to be in today, it also has a lot of, let's call them “shades of gray.” One of the things that bothers me the most when it comes to AI, machine learning, and data science, is the hype. Hype is probably the one thing that does the most to derail the real purpose of data science. Even with large conferences like this one, you see a lot of hype being created. And, even worse than that, you see a lot of hype being consumed. People have these tendencies. Sometimes it feels to me like people need to believe that AI and machine learning guard these magical things that you can just wave and almost anything you wish will happen. This cannot be further from the truth.
Markus: We discussed this problem before and people have a lot of half-knowledge. They see some scenarios working that blow your mind and thus they conclude anything is now possible. I think there's probably a lot of work left on our end. It’s us. as presenters, authors, and educators, as well as consultants, who need to set the expectations a little more realistically. And I know you're doing a tremendous amount of work in that direction.
Ciprian: Yeah, absolutely. That's absolutely right. People have this biased idea, that it’s easier to become a seasoned professional in data science than in other fields. And honestly, I can't think of any reason why it would be easier to become a data scientist then it would be to become a physicist, for example. There's another example that I typically like to do, to tell people about this. It takes perhaps a couple of weeks to learn to drive a car, but acquiring the ability to drive a Formula One car or an Indie Car is something that needs to be learned from the earliest possible age. It's pretty much the same with all fields. Data science included. It's not that easy to break into the field.
As a matter of fact, there's hype around getting into the field as well. People think the entry barrier for data science is much higher than for other fields. I don't believe that. It does take pretty much the same amount. It's still around those 10,000 hours of practice that you need to gain a proper level of knowledge, a proper level of experience. And where you see these the most in data science and machine learning is not necessarily in the pure theory, the pure math, or the statistics behind it. You see it when people need to make decisions. Because believe it or not, data science is a lot about trial and error. And there are many, many cases when you simply don’t have all the moving parts, all the hard figures that you’ll need to make a 100% informed decision. And that's where experience kicks in. That's when those tens of thousands of hours, that previous experience, becomes very, very important.
You still need about 10,000 hours to gain a proper level of knowledge to be a data scientist.
The other thing that I see a lot in the industry is a phenomenon that I call the “wanna-be data scientist.” And because there’s a huge lack of skill in the field and a huge lack of skilled human resources in the field, you see organizations, you see customers giving work to these people. Obviously, it’s anybody's right on this planet to do any kind of work they want, and I'm not commenting in any way on that. The problem is that most of these projects have a tendency to become either challenged or failed projects. And then the immediate kind of repercussion of that is a backfiring from business decision-makers. This isn’t just a theoretical problem. I've already been there. I've already experienced that talking with high-level business decision-makers. They say, “Hey, Ciprian, I've seen you at this conference. I know you are a seasoned professional, but, uh, don't get me wrong. We've been burned already in these types of projects. And, uh, we just, uh, decided to sit this one out to wait a little bit more to see what's happening.”
Markus: They expect proper return on investment. I mean, the potential return on investment is huge, right? But you need to make sure you get there. We worked with a customer that works with perishable food items. Things like salads going bad is a big problem in their business. So this is what they’re trying to manage. And if you can optimize that business, where 5% less waste occurs, that's huge! And you can really make that happen in many cases. But as you say, there is a real degree of craftsmanship. First of all, what questions do you ask? We have a customer in the oil and gas industry who wanted to know how they could detect early when drill-heads are going to fail when they drill oil wells. The data scientists and the AI they used came to the conclusion that the deeper they go, the more likely it is to break. It boggles the mind, how somebody thought that was an interesting answer rather than just completely obvious and not useful. So yes, craftsmanship is a big thing there.
Let's take this a step further. Right now, there's the theoretical level, the data analysis, the data science, and of course, people get a lot of benefit from pre-trained models that they can use. So a lot of it is about using what's there. And then a lot of it is creating custom models. You maintain your data and clean your data properly. Out of all those things that you can do, what are the biggest areas you see? At my company, we work in computer vision and similar things. But I know you cast a wider net. You even go into quantum computing and cyber security, and I know you're very passionate about that.
Ciprian: Yeah. Well, speaking of all that, you could call me a failed physicist. Most probably if I hadn’t ended up being an IT guy, I would have ended up somewhere in quantum physics. That's one of the fields that’s been my passion ever since I was a child. And it's amazing to see that that's now becoming a thing in terms of quantum computing. Coming back to your question though, at the present time, I do a lot of work in exploratory customer behavior analysis. There are some very interesting developments. There are lots and lots of companies who have a real need, a measurable need, for these types of solutions.
And by the way, I think you touched a very important point earlier, which is still challenging in machine learning today. It’s identifying the compelling business case. You have to fight so much hype today. You have to fight so many wrong perceptions. Then, when it comes to actually building a business case that’s based on hard, measurable returns on investment, which enables the customer to really measure the performance of your data science, we see many, many teams and many companies failing at this very point. They are failing when it comes to providing a compelling business case.
We also do a lot of work in natural language processing and text analytics. This is another very, very interesting area that we’re involved in a lot. We do very practical and hands-on types of projects, ranging from classification based on natural language processing and coding, and going all the way to processing things like media and use data.
Markus: You essentially take in an audio stream, right? You start, not with the written word, but with the audio of a sentiment. Potentially, you do transcriptions and then take it from there, or how do we imagine that?
Ciprian: It's a little more complicated than that or a little bit more elaborate in the sense that we mostly work with text from online written media. One of the challenges that we’re aiming or planning to solve is understanding the real structure of information out there in the wild. These days you see a lot of initiatives around classifying news as fake or legitimate news, and things like that. I think that’s not too harsh of a term, but I think it's a little bit simplistic. The types of projects we’re involved in are the projects where we aren’t looking to provide a measure of quality for what’s been said out there in the public. Rather it’s about estimating the support for any statement. Think about a simplified situation like this. “Markus said, ‘it's raining in Vegas’.” What we're trying to do is not assign a certain degree of true or false to the statement, but rather find out that, for example, in the western part of the United States there were 525 other people who were saying a very similar thing. Something that’s very close, semantically and linguistically, to the statement that it’s raining in Vegas. It might come as a surprise, but this is the type of thing that a lot of companies are interested in. They’re interested because mostly the companies who have a well-developed social consciousness are the ones who’re really interested in what is being said. What’s the perception about them? Yes, the first level is sentiment analysis, but I believe this is a deeper level, a level that’s also much more difficult to achieve.
This is another very practical area in which we work. We also do work in IoT. As a matter of fact, in addition to speaking at the conference, I, together with my colleagues and some clients, have delivered a two-day workshop on combining IoT with machine learning. This is a very interesting field and there’s already a lot happening. We started with pure IoT and the challenges of typical IoT solutions, but then we move into how these solutions can be improved. How can they benefit from applying machine learning approaches to the vast amounts of data. Mostly streaming data that is generated in the world of IoT.
Markus: To switch gears a little bit, what do you make of the threats of all of this? A lot of people go, “Oh, I know AI is going to take over the world.” They're going to have a Terminator-like scenario. I'm not a big believer in that but what I do worry about is a different angle of all of this. The deep fakes, the derailing of fact-fullness. The more sophisticated phishing messages people encounter. Isn't there a real issue in attacks of that nature becoming so sophisticated that it's really difficult to battle that? And I do wonder if AI can help us battle it as well, but it seems that's a lot more difficult than to come up with the actual malicious content.
Ciprian: Yeah, it is. And by the way, that “Terminator Moment,” we have a name for that in machine learning and data science. It’s called the “Singularity.” It's that very moment when a machine, or a set of machines, exceed the capabilities of the human brain in all aspects. It has been projected that the singularity would occur sometime around 2010 and then it was updated to 2020. Now it's been moved to around 2050. I'm not a believer in that either. But I'm a believer in the duality of any kind of powerful technology or mechanism or theory that we—humankind—get access to. Because it's always a game of the bright side and the dark side. Obviously, my take is a balance. You shouldn’t be overexcited about everything you can do with machine learning and AI. Neither should you be deeply rooted in the dark side. Like “this is going to kill the world,” right? And if you look throughout our history as a human race, it's always been like this. Think about...oh, I don't know… E = mc2, perhaps. That equation brought us nuclear energy, which is still one of the cleanest types of energy that we know of. But it also brought us the nuclear bomb. So there's no reason for me to think that something as powerful—and make no mistake, machine learning and AI are very powerful things—but I have no reason to believe that we’ll deal with them in a radically different way. And you said it very well: The power, specifically the computing power that we have today at our fingertips is already being used to do harm as well, via machine learning. It might come as a surprise to lots of people, but as of today, we only need maybe 20 to 30 minutes of somebody's decent voice recording to use sophisticated machine learning to make that person say virtually anything that we want. And it's not only with audio, it's possible also with video.
I’m not a believer in the “Singularity,” that “Terminator Moment.” But I am a believer in the duality of any kind of powerful technology.
Markus: Yes, absolutely. Which is very interesting, when you think of what's going on politically in the world, among other things.
Ciprian: Exactly.
Markus: Or even just the Orwellian scheme, right? The Chinese say they can identify anyone, within a second, using face analysis. Anyone within China, that is. And globally, they have the power, even though not the data sources perhaps, to identify anyone within two seconds. Those are, in a way, scary things. And I often wonder: Does it take more computing power to create deep fakes or to detect them?
Ciprian: Well, that’s the typical kind of a chicken and egg problem. The real problem is—and this applies to a certain extent to deep fakes—that deep fakes are not yet so sophisticated as to be extremely difficult to detect. But they’re getting there. The same applies where we have adversarial attacks, which basically play on the fundamental differences between the way the human brain works and way the machine brain works. No matter what folks will tell you, we’re light years away from mimicking the real behavior of the human brain. Even the smartest machine learning model today has a completely different internal way of working than the human brain. And there are documented ways of attacking that specific difference. You see these, for example, in autonomous self-driving cars. There are documented ways in which you can take a few rectangular stickers, stick them on a road sign, and all of a sudden, yield becomes interpreted as “turn right” or “turn left.” We’re not aware of any proven mathematical models that would guarantee success in fighting against these things.
We're quite far from being clear on whether we can be effective in dealing with these things. And, as it happened many times in history, eventually the solution isn’t in the theory and isn’t in the technical part. The solution is in educating people to be responsible about these technologies. The solution is in creating a worldwide framework that will help govern the way in which these technologies are used. Again, it might seem a little bit of a stretch, but it's a great comparison with what happened with nuclear weapons. One of the reasons why they weren’t used, except for testing, and except for the end of World War II, was because of these frameworks that were created around the world.
You already see this starting. You already see big companies, like, for example, Microsoft is really driving concepts like responsible AI. You see independent bodies doing the same. For example, folks who’ve created the declaration of Montreal on Responsible AI. People are starting to feel this threat. And what’s encouraging for me is that they start realizing that the solution is at a different level. It's not necessarily at the technical level because that’s a chicken and egg problem. We’ll get better at creating, for example, deep fakes, and then we’ll get better at detecting them, and then we’ll get better at creating them, and so on and so forth.
Markus: You and I don't work for Microsoft, but we have a relationship with them. It's nice to see that a company like Microsoft takes the ethics of all of that very seriously. Microsoft just been voted the most ethical company in North America. It’s very interesting to see that sort of stuff.
Ciprian: It’s very important because the Microsofts, Facebooks, Amazons, and Googles of the world, I believe, together with their immense business success, also have huge responsibility in terms of what they deliver to the world. What’s preventing some of the really powerful stuff that they’re developing from being used in a non-ethical or non-responsible way? It's really encouraging to see that at the highest levels starting from the board and the CEO of the company, Microsoft really gets it and really understands the importance of driving responsible AI and of driving responsibility in this field.
Microsoft really gets it and really understands the importance of responsible AI.
Markus: Some of it always seems to me isn’t even just that super high-level. One of the things that's probably easier to do than a lot of other things is to mine the data of people and figure out who the people are that are most easily defrauded, and what's a group of people I can get through to, to reach a tipping point. I think if I remember correctly in one of your presentations, you talk about Brexit and how a certain subgroup of people was identified that was influenced, which was enough to sway the vote toward leaving the EU, rather than remaining.
Ciprian: Yeah. That was one of the cases where these types of misuse of technology happened way before the world had the chance to organize itself against that type of abuse. We all know the story of Cambridge Analytica, and how social media was literally exploited by certain individuals. And as bad as this was, it’s also a very important wake up call. I believe it has an important role increasing knowledge and awareness on what the bad things are that can happen when machine learning is used in a non-ethical and non-responsible way.
The Cambridge Analytica scandal was a very important wake-up call for the industry.
Markus: Now to switch to something a little more positive and forward looking again: Both of us use a lot of Microsoft technologies. In my company, we use the Custom Vision and Computer Vision APIs. Microsoft has Face ID and things that are really powerful and probably industry-leading by quite a margin. And there are tons of other providers out there. Who else do you work with? Do you use an AWS or any of those things as well?
Ciprian: We use a wide range of technologies. If you look at some of the most important tool sets and frameworks today, they’re really vendor agnostic. Think about data preparation and data engineering workloads, as an example. By a large margin, Spark is the place to go. And fortunately, Spark is available in Microsoft Azure as Azure Databricks. Spark is available in AWS. Spark is available in the Google cloud. This is one very interesting place that we’re in, especially in data engineering and in data science through the immense success of open source platforms and various technologies. I’d dare to say that you have almost a common denominator and then it obviously boils down to either personal preference and perception on whether to use Microsoft or AWS or Google. Or it was down to things such as perceiving Microsoft as being more ethical than one of its competitors. Or it boils down to the sheer cost of the platform. So yes, we do a lot of work with other vendors as well. It’s true. But most of our work and most of our workloads run on Microsoft Azure.
Markus: Have you done anything with any of the Asian vendors?
Ciprian: Not yet, no. No. Haven't had the chance yet. To be very honest with you, I’m a firm believer in competition and competitive markets. And in the world of data science, I fear monopoly. This is why I'm so happy that I see, for example, PyTorch slowly but steadily starting to balance TensorFlow. Because I don't want to live in a world where the only option for doing serious deep learning is TensorFlow. I want to have options. And that can only be good for us as data scientists, but also for the field in its entirety.
Markus: I couldn't agree more.
It's been a pleasure talking to you, as always. Unfortunately, we are out of time, because both of us have to run off to our next presentations. Good luck with your upcoming talk!
Ciprian: This has been an absolutely great conversation. We'll do it again, maybe at the upcoming Regional Director and MVP Summit in Redmond!