O'Reilly's 2017 Data Science Salary Survey finds that location is the most significant salary determinant for data professionals, with median salaries ranging from $134,000 in California to under $30,000 in Eastern Europe, and highlights that negotiation skills can lead to salary differences as high as $45,000. Other key factors impacting earnings include company age and size, job title, industry, and education, while popular tools and languages - such as Python, SQL, and Spark - do not strongly influence salary despite widespread use.
You're listening to machine learning applied. In this episode, we're gonna talk about money. We're gonna talk about data scientist salaries, and what various factors affect those salaries. So this is a summary of a survey put out by O'Reilly called the 2017 Data Science Salary Survey Tools, trends, titles, what pays and what doesn't for data professionals.
And O'Reilly puts out these surveys once a year, starting from 2013 if they've been collecting data from professionals all the way back to 2013 in order to analyze what factors affect data scientists, salaries, things like location, age, gender, and all these things. This survey is A PDF. You can get the survey online.
I'll put a link in the show notes, and I highly recommend you do get the survey online and read the PDF. They've got excellent charts and graphs that I think will convey the information more effectively than this audio format. Their survey is a thorough presentation, whereas what I'm gonna present in this episode is gonna be just my personal favorite highlights, my own takeaways from this survey, so this'll be biased towards my favorite highlights.
So I recommend getting the PDF link from the show notes after you listen to this episode and reading it offline. And so the median salary of data professionals globally is $90,000 US dollars. So all the money in the survey is discussed in US dollars 90 K median salary globally. That's pretty high actually, in my opinion.
Uh, they say it's up $5,000 compared to last year's median of 80 5K. And so it looks like data jobs are on the rise. I think that's no surprise to any of us here. We all see machine learning in the sort as the way of the future. When we start going to different world regions. We see that the United States has the highest median salary of 112 K, and they note here that that's nearly double the Western European average of 57 K.
So where you live is a very, very, very strong factor. Uh. Australia and New Zealand come in second place behind United States, and then the lowest paying countries are gonna be in Eastern Europe at under $30,000. Everybody else is kind of scattered in the middle. Asia in particular has a very wide interquartile range, meaning it's tough to peg kind of what is average in that world region.
You can say the median is some amount, but the range is very large in the United States. Again, no surprise here, the highest paying state is California. We all know that's the Bay Area with a median salary of $134,000. So big difference between Western Europe, for example, at 57 K and California at 130 4K.
That's nearly three times the salary. I myself am in the Pacific Northwest region of the United States. Oregon specifically doesn't really have a competitive salary compared to the Bay Area or Seattle, and so I kind of weigh in my mind from time to time whether or not it would, whether or not I should just up and move to one of these cities.
'cause you can make a substantially different salary by living in one of these hotbeds. Next they talk about gender. So women's salaries are indeed less than men's by it looks like about eight K median. At the same time, the number of respondents is 20% female and 80% male, so it's definitely male dominated field, but it looks like the amount of female respondents has doubled in recent times, which means many more women are getting into the space.
Age is an interesting one. It looks like the older you get, the more you make, naturally comes with experience, but also the older you get, the fewer the respondents. So whether or not these people are going into executive roles or getting sort of pushed outta their roles, we've seen this fiasco in the news of IBM pushing a lot of their employees over 50 out of the company.
Who knows? Down some pages, they talk about industry, so different industries pay different amounts. Obviously you'd expect that in education and nonprofit being on the low end, and maybe something like banking being on the high end. But actually the different industries kind of threw me for a loop. So it looks like search and social networking are some of the higher paid roles.
And that would make sense if we're talking about people being employed at Facebook and Google as well as media and entertainment. It looks like media and entertainment roles pay very high, so that's something to keep an eye on. If you can get an offer from a company in media or entertainment and banking and finance actually doesn't make quite as much as it used to.
Next up, they talk about education. They talk about the difference in salary as you go from master's to PhD. They don't even mention bachelor's. It's assumed that the bare minimum is a master's. So that kind of speaks of a recent episode of mine where I was discussing education. It looks like the difference of having a master's versus a PhD is on the order of $5,000.
So it's actually not that much. It doesn't seem, again, everything is medians here, which means we're not considering context. So your mileage may vary, but this data here might imply there's no need to go on to get your PhD unless that's something you fancy. You want to work in research, they talk about company age and size.
This is an interesting one. So. Younger companies tend to pay more companies whose age is between two and five years old. This is what they consider a young company. Anything over five years old is an older company, so between two and five years old, these companies pay more and they, they have some theories maybe about after about two years old, you have some investment and all these things and you have to compete with the larger companies like Google, Facebook, and Amazon.
And a company less than two years old, likely has no capital to work with. These salaries are about $40,000, which is half the global median. I. So by this data, avoid companies that are less than two years old. Or if you interview for a company that looks great but is less than two years old, you may want to ask them about your finances before you go too deep down the rabbit hole in the interviewing process.
So target companies between two and five years old, but then at the same time, the larger the company in people, the higher the salary. So, you know, once we get. Past the five year mark. Age of a company is typically when you're looking at larger company size. So this dichotomy here may kind of cancel itself out.
I kind of liked that data point because I prefer to work for startups personally, and I always wondered if the salary of startups is much lower on average than working for a larger company, and it looks like it's kind of a wash job. Titles, this is an interesting one. They found that. People who position themselves as data scientists or data analysts.
Median salary is 87 K compared to people who position themselves as engineers. The median salary is 80 K, so there's a $7,000 difference. Depending on how you position yourself, which is important for a lot of us, which is important for a lot of you listeners. I personally have been positioning myself as a machine learning engineer.
I don't know, maybe that makes a difference. So by this data, maybe you should call yourself a data scientist. Of course, don't lie. If your role is as a support engineer, don't say you're a data scientist. But if it could go both ways, maybe pick the role data scientists and then of course we get up to VP and director, they make a lot more.
And, and then the CTO roles, of course make the highest 150,000 median for CTO roles. I'm gonna skip some sections on meetings, time, coding, length of work, week. Then the survey starts getting into subjective assessment. One of these sections is how easy do you think it would be for you to find a new role if you had to?
And this makes a difference in salary. Another one being your self-assessed bargaining skills is the name of this section. And what they did is they had respondents rank from one to five how good they think they are. At negotiating their salary. At bargaining their salary. When it comes time they get the offer from the company.
They say, Hey, 70 K, and the respondent will negotiate back, Hey, look, people in my position make a little bit more. How good do you think you are at negotiating? On a scale from one to five, they found the difference between a one and a five. A respondent with a one median salary is 70 K, and with a five is one 15 k.
70 to one 15 medians. That is a $40,000 difference in salary, which puts that as if I'm not mistaken, according to the survey, the second biggest differentiator in salary. The first biggest differentiator is where you live. And specifically, do you live in the Bay Area or not? But then over here, man, $40,000.
That is a huge chunk of cash. The takeaway is clear. You guys, we all need to get way better at negotiating our salaries. We need to take it very, very seriously. When it comes time we get an offer from a company, you really need to up your bargaining skills. 'cause that could be the difference between $40,000.
So keep that one in the back pocket. Maybe you can't up and move to Silicon Valley, but you can start working on your negotiation skills. That's an important skill to have according to this survey. All right, we're gonna talk tools, programming languages, operating systems. This is the fun stuff, right? They found that 67% used windows.
55% are on Linux, 18% Unix and 46% Mac, but they are finding that Windows is on the decline and Unix systems are on the incline. I. Programming languages. All right. Drum roll. I think this is everyone's favorite topic. I'm just gonna read these percentages right out. SQL is the top at 64% SQL, sql, however you prefer to say it.
64%. Python is 63%, R is 54%. And then we have what they call the long tail languages. Bash JavaScript, Java Scala, CC plus plus, C Sharp, so those guys are on the long tail. Those are much less common than Python and r, Java and Scala at 18 and 13% respectively. You might want to consider lumping them in together since Scala is a JVM language, so that may be sort of hurting its score.
By separating them out. So you might give that a little bit of a bump. And then C and c plus plus at nine and 8%. So those are way down on the list and that makes sense. They're, they're more difficult languages to write in. And likely if you're dealing in C or c plus plus, you're working on maybe embedded systems or robotics, something a little bit more hardcore.
Something where you're dealing at the metal. So Python's, the most popular of the procedural programming languages followed by R. So those two are kind of going head to head with Python at the lead, and then everybody else below it. Now they say SQL is the most popular of these languages. I mean, SQL is structured query languages.
This is for querying your relational database management systems, whether it's my SQL Postgres, SQL Server, and the like. I wouldn't really throw these into the languages mix, but. It's good to know how popular it is because what that tells you as a reader is these professionals are using languages in their professional setting, which means that their employers are using and requiring sql.
They have RDBMSs, they have SQL databases, which conveys that. That's the most common data storage format out there is data stored in a SQL database, which means it behooves you to get to know SQL if you don't already. Because you're gonna be dealing with data as a data professional, and it sounds like the most common data format you're gonna be dealing with professionally is sql.
So data has to come from somewhere. It's gonna come from sql. You should get to know it if you don't already know it. But then that's sort of a different topic. We move over to the procedural programming language, that Python is the most popular, followed by R, and then followed by everything else in the long tail.
C and c plus plus, those guys make the most money. And that makes sense. Not only, not only are the languages hardcore harder to use than Python, but also typically the types of applications you're using these in are more hardcore as well. High frequency trading, robotics, systems engineering and the like.
But the median difference in salary between a Python engineer and a C engineer might not be worth making that big switch. As we'll talk about at the end of this episode, they have sort of a summary section where you pick different factors in your life and it will tell you what salary you should be looking at and languages.
Doesn't make the cut as one of those differentiators. Doesn't seem to be a huge differentiator in salary as far as the relational databases go, most popular is my SQL at 37%, followed by Microsoft SQL Server, 30% followed by Postgres, 28%. I like Postgres personally, that's what I use. But again, here they point out and I quote, knowing the most popular databases isn't a great differentiator when it comes to salary, so it turns out languages and tools.
These don't seem to make a huge difference in the money, but it is interesting to see which of these languages and databases are more popular as an indicator of where you should be focusing your efforts. If your technology hasn't already been chosen for you by your employer, sounds like the answer is Python plus MySQL.
They have a whole section on search engines like Elastic Search, solar and Leucine, and big data platforms like Spark and Hadoop. Elasticsearch is the most popular of the tools, but solar is the highest paying as far as search engines go, spark as the most popular of the big data platforms followed by Hive, then MongoDB and most of the popular big data platforms pay about the same as far as salary goes.
Looks like the highest paying is Google BigQuery slash fusion table as well as Couchbase, but I'm completely unfamiliar with those two. But if you do look at this bar chart, they show Spark as immensely more popular than its competition. So it would appear that Spark is definitely something to keep an eye on.
See if that's something that you'll want to learn. Uh, they talk about business intelligence and reporting Excel. Everyone in the world uses Excel. You should probably get to know Excel. There's real, there's not really anything to say about it. If we go to machine learning, we got 37% of respondents use, psychic Learn, 16% use Spark ml lib.
So I do know about this thing, uh, spark, which is a MapReduce big data tool. So if you have a data warehouse with terabytes and terabytes of data, you use Spark or Hadoop. But it looks like Spark's the most popular, and they have a plugin called Ml Lib that will integrate the machine learning machinery into your Spark pipeline.
So for huge, enormous data like you'd be dealing with in that case, I don't think psychic learn would really work out. So you'd use this. Machine learning library called ML lib for spark visualization tools. We've got GG plot, which is an R thing. Um, map plot lib on Jupyter for Python. Some people use JavaScript's D three library for, for more heavy duty visualizations in an and analytics.
And everything else is in the long tail. I think I'm actually gonna be doing an episode on visualization tools, comparing Boca to Matt, plot lib to seaborn and all those things. So it's good to know that Matt plot lib for our concerns. Us Python people is the most popular and you could probably just stick to those guns.
And they've got a few more sections in there. Like I said, do download the PDF and go through it yourself. I'm sure there's some stuff I didn't cover that you would find much more interesting. But before we stop, at the very end of the survey, they have the highest differentiators and they have you just pick like, what country are you in, how much experience do you have, what is the size of the company you're applying to?
And it'll tell you your projected salary. And the takeaway from the section is they've chosen the top differentiators according to a machine learning model using an R squared of 0.6. So it says. This model explains approximately 60% of the variation of the sample salaries. So in cases where there is a, a very high differentiator, but it has too high a variance to be useful for your purposes and trying to decide how much money you're worth, uh, they'd throw that away.
So these factors have high confidence of being a high salary differentiator, and those are world region or United States region. Experience gender, company size education, but they only show that as being a $5,000 difference annually job title and industry. And so those factors are very important to pay attention to.
They can affect your salary the most. That's it for today. Go to the Patreon page where you find the show notes and download this PDF and browse through it in your spare time.