Using public datasets for dev rel

Felipe Hoffa
Felipe Hoffa
DevRelCon Earth 2020
30th to 10th June 2020
Online

Felipe Hoffa introduces how he uses public data sets and the ways that they help him to be more effective.

Watch the talk

Key takeaways

Takeaways coming soon!

Transcript

Felipe Hoffa: So my name is Felipe Hoffa. I'm a developer advocate for Google. I've been at Google for nine years so far, seven as a developer advocate. First, I was a software engineer, but then I discovered that this is my one true path. And I'm super happy to be in this conference with other Deborah people.

And what I really care about, what I love doing is data. So I'm a developer advocate at Google. I love data. I love traveling. I haven't been able to travel much lately.

Of course, asking questions. I'm a curious person. That's what data analysis is about. In this talk, I want to share with you some of my favorite datasets when I'm looking at DevRel. But, of course, right now, we are going through a global pandemic.

A lot of things are changing. Let let me show you some of the work I've done related to this. For example, I took all the mobility data that Apple is publishing. I put it inside BigQuery. BigQuery is my favorite data warehouse.

That's base my job is basically to talk about BigQuery. And it's perfect. So I took all of the Apple mobility data. I put it in Data Studio, and now I share dashboards that show, for example, These are the cities that are most staying at home right now in The United States. And you can see that, yeah, in March, there was a big, big dip, but then the whole United States is started going out again.

And you can see that the cities that are most staying at home right now are New Orleans, Honolulu, Miami, Phoenix, and San Francisco. But a lot changes through time. This is my way of traveling around the world. Now I can take all of this data and compress it even more. You can see here the middle chart is the chart in the middle shows most of a lot of cities around The US.

And you can see that, oh, the one that is the lighter color, that's New Orleans. That's the one that's that has been the most compliant staying at home. Further up, you can see San Francisco also. But you can see that most of The United States moving to going home. To the left, you can compare with the situation in Europe that went more home than other places.

And in the right, you can see Latin America. And this is how I can see that the whole of Latin America is right now staying more at home than the rest of the world, at least in this chart. Probably that's because it's winter right now there. Now in the darker lines, that's Brazil. Brazil hasn't been good at staying at home.

And this will be interesting in a minute because one thing is what's what we have the whole mobility situation around the world as seen by Apple, but this also affects how we are doing DevRel and how people are behaving. And we can compare this data, the general mobility, with what's happening in our field. So instead of looking at mobility data, I was also looking at Meetup. I'm collecting every Meetup r s RSVP in real time. I still have to write a blog post sharing how to collect all this data and how you can use it.

But what's interesting here in this chart is that I'm looking at the countries that have the most meetups, and you can compare with the two colors this year versus last year, the numbers of of RSVPs. And you can see that The United States, back in March, it quickly dropped to half of the RSVPs that we used Meetup used to receive daily. You can see how every country is behaving in a different way. Singapore is really interesting. Like, it had a big drop after the rest of the country's hit, then it kinda come back, and now it's going down.

The gap in July for last year is because I'm just missing Wikos data from last year. And what's also interesting, every country is interesting in itself. Hong Kong is back to normal. New Zealand had a big drop, and now it's back to normal. There are different ways of handling the pandemic, and you might know that from the news.

Now that's not all. Me now a lot of people are using Meetup online. We are doing this event online. I was scheduled to do this in Tokyo, but now I'm doing this at home. Here, you can find out what percentage of RSVPs to meet up are for online events.

And you can see that The US not only dropped to half of the RSVPs for meet up events. Now half of those RSVPs during July are for online events. In England, that's 60%. And what I find really, really interesting, if you go to Brazil, even though Brazil is going out according to Apple's data, on Meetup, people that have access to Meetup and that are sweeping to it for events, 90% of them are staying at home. So you you can see that there is a difference between the general population and people interested in events.

Brazil, even though it's the country that's going out the most in South America, it still has one of the largest people percentage of people staying at home for events. And then I started looking at different things. Like, for example, if you look at since April 2020, what are the top online meetup groups? Well, the online top meetup groups, of course, they're they're most doing their stuff online. And you can see that the group that has seen the most number of different people, unique people joining their events with 240 events is u UT Dallas computer science outreach, followed by Data Plus AI online meetups, Silicon Valley data science, MLI platform.

And number four, who women who code SF. Number five, the. They do a lot of it now, Bitcoin. And in number six, that's a very cool one for me. That's live with Google Cloud Developer Relations, and that's a meetup group that I started with my teammate, Yufeng.

And we started this because Yufeng was doing a lot of livestreams. I joined his livestreams. More teammates have joined. And we started a group out of nowhere because we needed a place to put our stuff together. And, suddenly, we are the sixth group in The US by number of people joining it.

As said, this is really interesting if for wherever your field is within DevRel, the question is what meetups you you to contact, what topics are trending. You can use data like this to go around the world and see who is they listen, what topics are are working. And not only this, I instead of only finding the top groups, I can find the top events. Again, these are the top 15 events by a number of people that have RSVP to the event since April 2020 in The US for the developer topics. Look at that.

Number one is my own event. I organized that one with Juventu. That was to celebrate BigQuery's tenth birthday, and I'm super proud to show that, well, we have the top event. In terms of metrics, this is a really cool thing to show internally. How do I tell to my manager and the rest of the company that what we're doing as DevRel has value and that we are getting engagement?

Oh, I can show them that we got the event with the most RSVPs in The US for developer meetups. And you can see other topics like a lot of them are related to data. Nml. Probably that's going super hot on meetup right now. But for example, number 15 is about Django.

There's a lot that you can do here. And one fun thing about online events, we they are not cut by a number of people anymore. It's not like normal events that you have a cap of how many people you can fit in a room. Let me change topics. This talk is about different datasets that you can use.

I showed you already Meetup. Let's look a little bit about Stack Overflow. And this is a blog post that I published last week. Here, I'm looking at the percentage of views that each tag is getting quarter by quarter. Part of the fact here is that I'm not looking only at historical data.

I'm using BigQuery and predictions, type series predictions, to look at where the trends are going. And you can see here that Python is hitting the world. It's the only language growing like crazy. Like, JavaScript is holding this ground, and Java is just going down in the percentage of attention it's getting. And thanks to my ability to to project time series into the future, you can see what are the top 10 tags of 2023 if the current trends continue.

And you can see that there are two tags that are not on the top right now, but they are growing really, really fast, which are React JS and Angular. Of course, when you present charts like this on Twitter, a lot of people get angry, especially when you show them what are the languages that are going down, which are like Java, C Sharp, PHP, Ruby, Objective C, Perl. Those are losing attention. People don't like when you tell them that, but that happens. Languages that are gaining attention, on the top right chart, you can see that TypeScript, Dart.

Laravel is not a language, so I should take that out. Go, Kotlin, Appscript, and Rust. And, yeah, the highest growth language, the biggest trend right now, if you had to pay attention to one, it might be Dart. Dart is a language to pay attention to. Again, I'm using all of these datasets.

I'm sharing real time. Or almost in real time, I'm taking the pulse of what's happening around the world. And the good news is that you can use these datasets too. And now this is the part of the talk where it's going to get a little crazy because I have, like, seven minutes to present 50 more slides. But this is all about how do I share with you in twenty minutes, what kind of datasets you can use to see what's happening around the world, understand your company, etcetera.

The next slides are from a conference I was invited last year to talk at. This is Linaro Connect in Bangkok. A lot of people around the world go to this conference. This is a conference about the around the ARM processor and what what this company is doing with open source. What do I know about the ARM processor?

Not much. Why would they invite me to give a keynote at their conference? That's because I could can take all of my dataset and present a story that's about them. What I want to share you in the next few minutes is what kind of stories you can tell using these datasets for your company. How can I go into any other conference of any community and just look at their data and share what's important for them?

So what I did in this talk was, okay. First, I tell people that there's a lot of data. GitHub has a lot of data, and I have a lot of GitHub inside BigQuery. Hence, I can start analyzing it. There's data.

I can answer questions like, for example, spaces versus tabs. Do people use spaces or tabs? Well, I got my results. I'm going to skip really fast over this. Turns out a lot of people prefer spaces to tabs except in Go.

And there are so so many other questions that you can look at if you're looking at GitHub. And, of course, GitHub is relevant for Deborah. For example, you can look at the top countries using GitHub or by per capita or what happens when it's cold versus it's hot. I don't have time to stop right now here, but please engage me on Twitter. What companies are contributing to open source?

I did this in 2018, 2017. I need to update the chart for 2020, but you can see that Google and Microsoft in this chart are the companies contributing the most in number of projects, number of stars, number of people inside. This is what the open source environment looks like. But and then that's one dataset. But you can go beyond GitHub.

I already show you a little bit of Stack Overflow, a hacker I'm also looking at Hacker News. I'm looking at Reddit. A lot of what we do happens in Hacker News. Wikipedia is also a really, really cool place to look at because a lot of people go to Wikipedia to understand new concepts. I can even look, for example, at datasets like the Python Python installs.

Like, every time someone does an app gets installed, I'm getting a ping of that data. There is a full dataset, and you can see how things change. Let me show you, for example, stories about ARM using these datasets. This is the ARM web page, and Linares is a company I gave this keynote to. And Gong Li is an actress.

It turns out that very their CEO is Lee Gong, which is a who is a computer scientist, and they love this joke because that turns out there two Wikipedia pages exist. You can use Wikipedia. A query like this, a little bit of SQL will show you how all of these pages are trending through time. And, yes, Anatik Chexter gets a lot more pages than Linaro. And Lignon computer scientist, the CEO, you can see how they started getting page views once they joined Linaro.

And, of course, there you have also Gongoli, the actress, just to compare. Even for Wikipedia, you can see how people are clicking for from page to page. So, for example, when people land on the ARM architecture page, what do they look next? Or, for example, you can see where do they come from. When they joined when they land on the legal and computer scientist page, where where are they coming from?

A lot of them were coming from the Gong Li page, the actress, and they just were really, really interesting in the data scientist. And this is a lot of Wikipedia. Hacker news. Having the whole of hacker news inside BigQuery is awesome. If you want to see how topics trend around developers, hackers, the story of Redis was really, really told via Hacker News.

That's where it was represented, and it got popular. And I'm not telling that story today because, oh, another an art conference, I will tell stories about ARM. So for example, I can find the first ARM stories in hacker news. And what are the ones that got the most the most votes and how the interest has been flowing. Same with Stack Overflow.

What were the first question for ARM? Or even more interesting, what are the top Stack Overflow questions for ARM? And if you look at the pages for each question, it will tell you what's happening around your technology, what are people interested in. What's the top question for ARM? How does it differ from x 86?

What's the second one? Someone is not finding an ARM command. And you should go look at what's the data for your company, for your products, and fix those questions. And now every question turns different to time. We can look at that too.

I wrote a blog post. I can share it later. Let me finish in the last few minutes. For example, let's look at how people are installing Python packages. What Python packages are more interesting now?

This if you go to an app conference, you may want to show what are the top downloads for ARM CPUs. These are where the top downloads. But what's even more interesting, if you go and start running your data, looking how it changes by through time and how you find what are the most interesting patterns depending on your criteria. For example, here, I'm normalizing, and I'm looking at the different systems installing it instead of just looking at raw numbers. This will tell you that the top packages in Python for ARM are t z update, r p I, auto bread crumbs, and the Google Assistant library, Adafruit.

And it shows you people are using ARM and Python a lot for mobile devices. Of course, I'll I'm going back to GitHub. On GitHub, you can look at all of the issues around every package. So for example, here, I'm looking at all the issues around GitHub that have the tag on. And you can see, like, oh, these are the projects that care about this.

And you can look further at your project. You can look at all projects and see who are mentioning your your technology. In my last minute, I can show you how Hacker News and Linaro were part of the conversation and what was more important. All of this is to tell you that there's a lot of data. A lot of these datasets are already shared.

The only thing you need to do is create a BigQuery account. It's free. You don't need a credit card. I love sharing with you this kind of insights, but the most important part is how you can use these datasets to drive your conversation. I didn't have time today to go into other datasets that are really important, like, on the private side.

You can get all of YouTube, the YouTube analytics inside BigQuery, and you can analyze how your videos are doing. You can get also your own Google Analytics inside BigQuery and join that data with the rest of your datasets and create a three sixty view of what's happening. Check my blog post where you can find out how to use BigQuery without a credit card. And I think with that, my time is gone.