Skip to main content

Top 25 Data Science Interview Questions and Answers

1) How would you create a taxonomy to identify key customer trends in unstructured data?
The best way to approach this question is to mention that it is good to check with the business owner and understand their objectives before categorizing the data. Having done this, it is always good to follow an iterative approach by pulling new data samples and improving the model accordingly by validating it for accuracy by soliciting feedback from the stakeholders of the business. This helps ensure that your model is producing actionable results and improving over the time.
2) Python or R – Which one would you prefer for text analytics?
The best possible answer for this would be Python because it has Pandas library that provides easy to use data structures and high performance data analysis tools.
3) Which technique is used to predict categorical responses?
Classification technique is used widely in mining for classifying data sets.
4) What is logistic regression? Or State an example when you have used logistic regression recently.
Logistic Regression often referred as logit model is a technique to predict the binary outcome from a linear combination of predictor variables. For example, if you want to predict whether a particular political leader will win the election or not. In this case, the outcome of prediction is binary i.e. 0 or 1 (Win/Lose). The predictor variables here would be the amount of money spent for election campaigning of a particular candidate, the amount of time spent in campaigning, etc.
5) What are Recommender Systems?
A subclass of information filtering systems that are meant to predict the preferences or ratings that a user would give to a product. Recommender systems are widely used in movies, news, research articles, products, social tags, music, etc.
6) Why data cleaning plays a vital role in analysis?
Cleaning data from multiple sources to transform it into a format that data analysts or data scientists can work with is a cumbersome process because - as the number of data sources increases, the time take to clean the data increases exponentially due to the number of sources and the volume of data generated in these sources. It might take up to 80% of the time for just cleaning data making it a critical part of analysis task.
7) Differentiate between univariate, bivariate and multivariate analysis.
These are descriptive statistical analysis techniques which can be differentiated based on the number of variables involved at a given point of time. For example, the pie charts of sales based on territory involve only one variable and can be referred to as univariate analysis.
If the analysis attempts to understand the difference between 2 variables at time as in a scatterplot, then it is referred to as bivariate analysis. For example, analysing the volume of sale and a spending can be considered as an example of bivariate analysis.
Analysis that deals with the study of more than two variables to understand the effect of variables on the responses is referred to as multivariate analysis.
8) What do you understand by the term Normal Distribution?
Data is usually distributed in different ways with a bias to the left or to the right or it can all be jumbled up. However, there are chances that data is distributed around a central value without any bias to the left or right and reaches normal distribution in the form of a bell shaped curve. The random variables are distributed in the form of an symmetrical bell shaped curve.
 9)What is Linear Regression?
Linear regression is a statistical technique where the score of a variable Y is predicted from the score of a second variable X. X is referred to as the predictor variable and Y as the criterion variable.
10)What is Interpolation and Extrapolation?
Estimating a value from 2 known values from a list of values is Interpolation. Extrapolation is approximating a value by extending a known set of values or facts.

Comments

Popular posts from this blog

Android Interview Questions and Answers

What is  Android ? It is an open-sourced operating system that is used primarily on mobile devices, such as cell phones and tablets. It is a Linux kernel-based system that’s been equipped with rich components that allows developers to create and run apps that can perform both basic and advanced functions. What Is the Google Android SDK? The Google Android SDK is a toolset that developers need in order to write apps on Android enabled devices. It contains a graphical interface that emulates an Android driven handheld environment, allowing them to test and debug their codes. >>Read more>>

Tableau Interview Questions and Answers

What is Data Visualization? A much advanced, direct, precise and ordered way of viewing large volumes of data is called data visualization. It is the visual representation of data in the form of graphs and charts, especially when you can’t define it textually. You can show trends, patters and correlations through various data visualization software and tools; Tableau is one such data visualization software used by businesses and corporates. What are the differences between Tableau desktop and Tableau Server? While Tableau desktop performs data visualization and workbook creation, Tableau server is used to distribute these interactive workbooks and/or reports to the right audience. Users can edit and update the workbooks and dashboards online or Server but cannot create new ones. However, there are limited editing options when compared to desktop. Tableau Public is again a free tool consisting of Desktop and Server components accessible to anyone. >>Read more>...

Top 25 Python Interview Questions and Answers Prepared by Experts

1. What is JSON? How would convert JSON data into Python data? JSON – stands for JavaScript Object Notation. It is a popular data format for storing data in NoSQL databases. Generally JSON is built on 2 structures.  A collection of <name, value> pairs.  An ordered list of values. As Python supports JSON parsers, JSON-based data is actually represented as a dictionary in Python. You can convert json data into python using load() of json module. 2. How are the functions help() and dir() different? These are the two functions that are accessible from the Python Interpreter. These two functions are used for viewing a consolidated dump of built-in functions. help() – it will display the documentation string. It is used to see the help related to modules, keywords, attributes, etc. To view the help related to string datatype, just execute a statement help(str) – it will display the documentation for ‘str, module. ◦ Eg: >>>help(str) or >>...