data scientists这四年面试经验小结

统计/生物统计/精算/数据科学/机器学习(含深度学习和强化学习)
本版讨论各种专业问题,相关职业问题,转专业问题,以及机器学习在各个学科的应用。
回复
头像
SOD楼主
论坛元老
论坛元老
SOD 的博客
帖子互动: 1369
帖子: 26069
注册时间: 2022年 7月 23日 22:53

data scientists这四年面试经验小结

帖子 SOD楼主 »

data scientists这四年面试经验小结     |只看干货
数据科学
chicago-batman 2019-11-23 21:04:09 | 只看该作者

来美国矽谷当data scientist当了四年了,当过manager,招聘过不少人。 期间也面了大大小小的公司,觉得data scientist实在是一个要求很广泛,每间公司要求都差非常多的一个职位。
. check 1point3acres for more.
以下是基本上所有公司大致上会问的题。除了2015亚马data scientist面试bar raiser有考过我leet code algo的题目之外, 我这四年还没有被考过leet code algo的题目。但leetcode sql一些easy/medium还是经常会被问的。
对于有经验的data scientist, system design也是经常问的,特别是如何在你现在的工作deploy machine learning model,设计ETL, streaming machine learning modeling 等。 以下是一些常见的问题整理。我大多是用英文写的,我就懒得翻译了:
. From 1point 3acres bbs
1. SQL:
Almost all data analysts and data scientists interviews involved in SQL!!!!
All companies test JOINs, group by!!!!
..
1a: This is more like a SQL syntax lookup page, but it’s tutorial is quite helpful for beginners.  All companies test JOINs, group by!!!!. 1point 3acres
https://www.w3schools.com/sql/
1b:  Facebook product data scientist interview official guide includes these 3 links: I found them very helpful. .--
SQL Course
http://www.sqlcourse2.com/
..
Programmer Interview SQL Practice Database  <= I find this one pretty helpful
https://www.programmerinterview. ... terview-question-1/
. ----
Mode Analytics SQL Tutorials-baidu 1point3acres
https://mode.com/sql-tutorial/introduction-to-sql
1c: leetcode: https://leetcode.com/
. 1point3acres.com
I highly recommend to practice all the SQL questions on leetcode (30ish questions-take about 5-15 hours each round) at least twice.  Those are very common interview questions such as self join, ranking, subqueries, etc.  
1d: Hacker rank SQL questions are quite good. From 1point 3acres bbs
https://www.hackerrank.com/.google  и
Leetcode’s twin.  If your company tests live coding questions here, be sure to familiarize yourself with the hacker rank platforms.  There are a lot of SQL questions here to practice.  
2. A/B testing. 1point 3 acres
. Χ
If the company is a social media firm or a firm with front end products, AB test can be used by their team A LOT.  
This is the best A/B test course ever.  Please budget a month for the course
.--
https://www.udacity.com/course/ab-testing--ud257
there are many good articles in Medium as well
https://towardsdatascience.com/
3. Machine Learning (ML):
. 1point 3acres
Not all data scientist jobs require deep knowledge in machine learning.  Usually machine learning engineer might require deep knowledge in ML, but it all depends on the company you joined.  My experience is that most jobs only required basic knowledge of some machine learning algos you listed on your resume, or basic ones like k-mean clustering, logistic regression, linear regression, PCA, random forests, supervised vs non-supervised machine learning.
A lot of people recommended this ML course.  i find that the course goes very deep into the principals of neural networks, but it doesn't really help most of my interviews.
https://www.coursera.org/learn/machine- ... g#syllabus
Also, companies usually test how to build a machine learning model orally or by doing a take home project. Machine learning is the easy part. Data cleaning, feature engineering, and data sampling are usually the difficult parts.  In the feature engineering part, companies often time want you to be able to come out some unique features or features that have business contexts. Also for the machine learning results, AUC under the ROC curve, F1 score, precision and recall are very often asked, actually way more often than the ML algos listed above.
4. Python/ R:. Waral dи,
Python has more advantages in connecting with software engineering projects , APIs, plus it can do a lot of data science/ ML projects.  So it has been favored by a lot of tech companies especially in the US west coast.   R is more used in academic and some financial/ biotech companies in the East coast where more advanced/ rare statistic methods are required.   Both python (a library called 'pandas') and R have the concept of data frame, which is crucial for data processing.  That is important to learn.
Also, it is good to pick up Python from any of the online courses and then learning by doing small (data science) projects.  By doing a project, you will learn the following key parts of python in data processing:
Basic concept: functions, libraries, data type transformation, object oriented.
Python packages: scikit learn (for ML), pandas, jupyter notebook, numpy, scipy, dependency/library controls such as miniconda etc.
version control: github
also i highly recommend to use (free version of) pycharm for python code development.  For data science projects, a lot of people use juypter notebook to share graphs.
5. Take home challenges:
这本书很推https://datamasked.com/这本书的题目有两位牛人做了解答在github里
https://github.com/JifuZhao/DS-Take-Home/
https://github.com/stasi009/TakeHomeDataChallenges
6. More:
the world of data science is way too broad.  The following keywords are good to have and good to know, depending on which companies/ which teams you are going to work with.  They are good to have especially if you are in the mid career , but not must have if you are a new grad.  
Software engineering: CI/CD, QA (unit test, integration test), amazon web service (S3 bucket, EC2 instance, etc.), configuration management, API
..
Data engineering: data warehouse, ETL,  NoSQL, Spark, scala, graph database
Machine learning: tensorflow, pytorch, amazon flink
Front end: javascript, react. From 1point 3acres bbs
7. 地里有一些好帖子也可以获得很多宝贵的资讯:
【最终版回报社会贴】new grads湾区DA/DS找工作超细致回顾+面经+资料总结!!!https://www.1point3acres.com/bbs/thread-469408-1-1.html
. .и
总结如何准备data science analytics interview,case study详解
https://www.1point3acres.com/bbs/thread-330947-1-1.html
[职场感言] 谈初创公司http://www.1point3acres.com/bbs/thread-421210-1-1.html.
想到再写更多。求大米!
此生无悔入华夏,家住加利福利亚

图片
回复

回到 “统计与数据科学(Statistics & DataScience)”