Skip to main content

Top 25 Data Science Interview Questions and Answers

1) How would you create a taxonomy to identify key customer trends in unstructured data?
The best way to approach this question is to mention that it is good to check with the business owner and understand their objectives before categorizing the data. Having done this, it is always good to follow an iterative approach by pulling new data samples and improving the model accordingly by validating it for accuracy by soliciting feedback from the stakeholders of the business. This helps ensure that your model is producing actionable results and improving over the time.
2) Python or R – Which one would you prefer for text analytics?
The best possible answer for this would be Python because it has Pandas library that provides easy to use data structures and high performance data analysis tools.
3) Which technique is used to predict categorical responses?
Classification technique is used widely in mining for classifying data sets.
4) What is logistic regression? Or State an example when you have used logistic regression recently.
Logistic Regression often referred as logit model is a technique to predict the binary outcome from a linear combination of predictor variables. For example, if you want to predict whether a particular political leader will win the election or not. In this case, the outcome of prediction is binary i.e. 0 or 1 (Win/Lose). The predictor variables here would be the amount of money spent for election campaigning of a particular candidate, the amount of time spent in campaigning, etc.
5) What are Recommender Systems?
A subclass of information filtering systems that are meant to predict the preferences or ratings that a user would give to a product. Recommender systems are widely used in movies, news, research articles, products, social tags, music, etc.
6) Why data cleaning plays a vital role in analysis?
Cleaning data from multiple sources to transform it into a format that data analysts or data scientists can work with is a cumbersome process because - as the number of data sources increases, the time take to clean the data increases exponentially due to the number of sources and the volume of data generated in these sources. It might take up to 80% of the time for just cleaning data making it a critical part of analysis task.
7) Differentiate between univariate, bivariate and multivariate analysis.
These are descriptive statistical analysis techniques which can be differentiated based on the number of variables involved at a given point of time. For example, the pie charts of sales based on territory involve only one variable and can be referred to as univariate analysis.
If the analysis attempts to understand the difference between 2 variables at time as in a scatterplot, then it is referred to as bivariate analysis. For example, analysing the volume of sale and a spending can be considered as an example of bivariate analysis.
Analysis that deals with the study of more than two variables to understand the effect of variables on the responses is referred to as multivariate analysis.
8) What do you understand by the term Normal Distribution?
Data is usually distributed in different ways with a bias to the left or to the right or it can all be jumbled up. However, there are chances that data is distributed around a central value without any bias to the left or right and reaches normal distribution in the form of a bell shaped curve. The random variables are distributed in the form of an symmetrical bell shaped curve.
 9)What is Linear Regression?
Linear regression is a statistical technique where the score of a variable Y is predicted from the score of a second variable X. X is referred to as the predictor variable and Y as the criterion variable.
10)What is Interpolation and Extrapolation?
Estimating a value from 2 known values from a list of values is Interpolation. Extrapolation is approximating a value by extending a known set of values or facts.

Comments

Popular posts from this blog

Automation Testing Python Interview Questions

1.What is Python? What are the benefits of using Python? Answer: Python is a programming language with objects, modules, threads, exceptions and automatic memory management. The benefits of pythons are that it is simple and easy, portable, extensible, build-in data structure and it is an open source. 2.   How Python can be used in software testing?  Answer: 1. To generate test data; parse test results; generate reports; testing API calls etc. 2. Python to extract requirements from a Word document. 3. For testing tasks automation, setting up environments for tests, extracting performance data, etc… 4. Testers use Python extensively in many companies with Selenium for test automation. 5. For writing desktop applications used by testers. 6. Test data manipulation. 7. To build test environment 8. Testing with IronPython on .NET  3.What Python frameworks do you know?  Answer:  Framework called Web2py, PAMIE (Python automation Module for I. E.), The p...

Python Flask Interview Questions and Answers

Q1). What is Flask? Ans1 : Flask is a web micro framework for  Python  based on “Werkzeug, Jinja 2 and good intentions” BSD licensed. Werkzeug and jingja are two of its dependencies. Q2). What is the benefit of using Flask? Ans2:  Flask is part of the micro-framework. Which means it will have little to no dependencies on external libraries. It makes the framework light while there is little dependency to update and less security bugs. Q3). What is the difference between  Django,  Pyramid, and Flask? Ans3:  Flask is a “microframework” primarily build for a small application with simpler requirements. In flask, you have to use external libraries. Flask is ready to use.Pyramid are build for larger applications. It provides flexibility and lets the developer use the right tools for their project. The developer can choose the database, URL structure, templating style and more. Pyramid is heavy configurable. Q4). What is Flask-WTF and what are t...

Python real time Interview Questions and Answers

Q1).What is Python? Ans1:  Python is a high-level, interpreted, interactive and object-oriented scripting language.  Python  is designed to be highly readable. It uses English keywords frequently where as other languages use punctuation, and it h as fewer syntactical constructions than other languages. Q2).Name some of the features of Python. Ans2:  Following are some of the salient features of python − It supports functional and structured programming methods as well as OOP. It can be used as a scripting language or can be compiled to byte-code for building large applications. It provides very high-level dynamic data types and supports dynamic type checking. It supports automatic garbage collection. It can be easily integrated with C, C++, COM, ActiveX, CORBA, and Java. Q3).Do you have any personal projects? Really? Ans3: This shows that you are willing to do more than the bare minimum in terms of keeping your skillset up to date. If you work on per...