Mỗi ngày một điều mới cùng Tr Anh

Thứ Bảy, 30 tháng 3, 2024

OUTLIERS (Hair, 2013)

Outliers are observations with a unique combination of characteristics identifiable as distinctly different from the other observations. It is judged to be an unusually high or low value on a variable or a unique combination of values across several variables that make the observation stand out from the others.

In assessing the impact of outliers, we must consider the practical and substantive considerations:

From a practical standpoint, outliers can have a marked effect on any type of empirical analysis.
In substantive terms, the outlier must be viewed in light of how representative it is of the population

Outliers cannot be categorically characterized as either beneficial or problematic, but instead must be viewed within the context of the analysis and should be evaluated by the types of information they may provide.

When beneficial, outliers—although different from the majority of the sample— may be indicative of characteristics of the population that would not be discovered in the normal course of analysis.
In contrast, problematic outliers are not representative of the population, are counter to the objectives of the analysis, and can seriously distort statistical tests. Owing to the varying impact of outliers, it is imperative that the researcher examine the data for the presence of outliers and ascertain their type of influence. Additionally, outliers should be placed in a framework particularly suited for assessing the influence of individual observations and determining whether this influence is helpful or harmful.

METHODS OF DETECTING OUTLIERS

Univariate Detection.

The univariate identification of outliers examines the distribution of observations for each variable in the analysis and selects as outliers those cases falling at the outer ranges (high or low) of the distribution. The primary issue is establishing the threshold for designation of an outlier. The typical approach first converts the data values to standard scores, which have a mean of 0 and a standard deviation of 1. Because the values are expressed in a standardized format, comparisons across variables can be made easily.

(Nguồn: https://ai-ml-analytics.com/outlier-detection/)

In either case, the researcher must recognize that a certain number of observations may occur normally in these outer ranges of the distribution. The researcher should strive to identify only those truly distinctive observations and designate them as outliers.

Bivariate Detection.

In addition to the univariate assessment, pairs of variables can be assessed jointly through a scatterplot. Cases that fall markedly outside the range of the other observations will be seen as isolated points in the scatterplot. To assist in determining the expected range of observations in this two-dimensional portrayal, an ellipse representing a bivariate normal distribution’s confidence interval (typically set at the 90% or 95% level) is superimposed over the scatterplot. This ellipse provides a graphical portrayal of the confidence limits and facilitates identification of the outliers. A variant of the scatterplot is termed the influence plot, with each point varying in size in relation to its influence on the relationship.

(Nguồn: https://ouzhang.me/blog/outlier-series/outliers-part4/)

Each of these methods provides an assessment of the uniqueness of each observation in relationship to the other observation based on a specific pair of variables. A drawback of the bivariate method in general is the potentially large number of scatterplots that arise as the number of variables increases. For three variables, it is only three graphs for all pairwise comparisons. But for five variables, it takes 10 graphs, and for 10 variables it takes 45 scatterplots! As a result, the researcher should limit the general use of bivariate methods to specific relationships between variables, such as the relationship of the dependent versus independent variables in regression. The researcher can then examine the set of scatterplots and identify any general pattern of one or more observations that would result in their designation as outliers.

Multivariate Detection.

Because most multivariate analyses involve more than two variables, the bivariate methods quickly become inadequate for several reasons. First, they require a large number of graphs, as discussed previously, when the number of variables reaches even moderate size. Second, they are limited to two dimensions (variables) at a time. Yet when more than two variables are considered, the researcher needs a means to objectively measure the multidimensional position of each observation relative to some common point.

(Nguồn: https://blogs.sas.com/content/iml/2019/03/25/geometry-multivariate-univariate-outliers.html)

This issue is addressed by the Mahalanobis D2 measure, a multivariate assessment of each observation across a set of variables. This method measures each observation’s distance in multidimensional space from the mean center of all observations, providing a single value for each observation no matter how many variables are considered. Higher D2 values represent observations farther removed from the general distribution of observations in this multidimensional space. This method, however, also has the drawback of only providing an overall assessment, such that it provides no insight as to which particular variables might lead to a high D2 value.

(Nguồn: Hair, 2013)

RETENTION OR DELETION OF THE OUTLIER

After the outliers are identified, profiled, and categorized, the researcher must decide on the retention or deletion of each one. Many philosophies among researchers offer guidance as to how to deal with outliers. Our belief is that they should be retained unless demonstrable proof indicates that they are truly aberrant and not representative of any observations in the population. If they do portray a representative element or segment of the population, they should be retained to ensure generalizability to the entire population. As outliers are deleted, the researcher runs the risk of improving the multivariate analysis but limiting its generalizability. If outliers are problematic in a particular technique, many times they can be accommodated in the analysis in a manner in which they do not seriously distort the analysis.

Nguồn:

Hair, J. F. (2009). Multivariate data analysis.

Thứ Năm, 28 tháng 3, 2024

A RECAP OF THE MISSING VALUE ANALYSIS (Hair, 2013)

(Nguồn:https://www.researchgate.net/publication/329398079_The_Sin_of_Missing_Data_Is_All_Forgiven_by_Way_of_Imputation/figures?lo=1 )

Evaluation of the issues surrounding missing datain the data set can be summarized in four conclusions:

The missing data process is MCAR.

All of the diagnostic techniques support the conclusion that no systematic missing data process exists, making the missing data MCAR (missing completely at random). Such a finding provides two advantages to the researcher. First, it should not involve any hidden impact on the results that need to be considered when interpreting the results. Second, any of the imputation methods can be applied as remedies for the missing data. Their selection need not be based on their ability to handle nonrandom processes, but instead on the applicability of the process and its impact on the results.

Imputation is the most logical course of action.

Even given the benefit of deleting cases and variables, the researcher is precluded from the simple solution of using the complete case method, because it results in an inadequate sample size. Some form of imputation is therefore needed to maintain an adequate sample size for any multivariate analysis.

Imputed correlations differ across techniques.

When estimating correlations among the variables in the presence of missing data, the researcher can choose from four commonly employed techniques: the complete case method, the all-available information method, the mean substitution method, and the EM method. The researcher is faced in this situation, however, with differences in the results among these methods. The all-available information, mean substitution, and EM approaches lead to generally consistent results. Notable differences, however, are found between these approaches and the complete information approach. Even though the complete information approach would seem the most “safe” and conservative, in this case it is not recommended due to the small sample used (only 26 observations) and its marked differences from the other two methods. The researcher should, if necessary, choose among the other approaches.

Multiple methods for replacing the missing data are available and appropriate.

Mean substitution is one acceptable means of generating replacement values for the missing data. The researcher also has available the regression and EM imputation methods, each of which give reasonably consistent estimates for most variables. The presence of several acceptable methods also enables the researcher to combine the estimates into asingle composite, hopefully mitigating any effects strictly due to one of the methods.

Nguồn:

Hair, J. F. (2009). Multivariate data analysis.

Thứ Ba, 26 tháng 3, 2024

Missing Data (Hair, 2013)

A Four-Step Process for Identifying Missing Data

Nguồn: Hair, J. F. (2009). Multivariate data analysis.

STEP 1: DETERMINE THE TYPE OF MISSING DATA

The first step in any examination of missing data is to determine the type of missing data involved. Here the researcher is concerned whether the missing data are part of the research design and under the control of the researcher or whether the “causes” and impacts are truly unknown. Let’s start with the missing data that are part of the research design and can be handled directly by the researcher.

Ignorable Missing Data: The justification for designating missing data as ignorable is that the missing data process is operating at random (i.e., the observed values are a random sample of the total set of values, observed and missing) or explicitly accommodated in the technique used. There are three instances in which a researcher most often encounters ignorable missing data.

The first example encountered in almost all surveys and most other data sets is the ignorable missing data process resulting from taking a sample of the population rather than gathering data from the entire population. In these instances, the missing data are those observations in a population that are not included when taking a sample. The purpose of multivariate techniques is to generalize from the sample observations to the entire population, which is really an attempt to overcome the missing data of observations not in the sample. The researcher makes these missing data ignorable by using probability sampling to select respondents. Probability sampling enables the researcher to specify that the missing data process leading to the omitted observations is random and that the missing data can be accounted for as sampling error in the statistical procedures. Thus, the missing data of the nonsampled observations are ignorable.

A second instance of ignorable missing data is due to the specific design of the data collection process. Certain nonprobability sampling plans are designed for specific types of analysis that accommodate the nonrandom nature of the sample. Much more common are missing data due to the design of the data collection instrument, such as through skip patterns where respondents skip sections of questions that are not applicable

A third type of ignorable missing data occurs when the data are censored. Censored data are observations not complete because of their stage in the missing data process. A typical example is an analysis of the causes of death. Respondents who are still living cannot provide complete information (i.e., cause or time of death) and are thus censored.

STEP 2: DETERMINE THE EXTENT OF MISSING DATA

The primary issue in this step of the process is to determine whether the extent or amount of missing data is low enough to not affect the results, even if it operates in a nonrandom manner. If it is sufficiently low, then any of the approaches for remedying missing data may be applied. If the missing data level is not low enough, then we must first determine the randomness of the missing data process before selecting a remedy (step 3).

How Much Missing Data Is Too Much?

Nguồn: Hair, J. F. (2009). Multivariate data analysis.

Assessing the Extent and Patterns of Missing Data.

The most direct means of assessing the extent of missing data is by tabulating (1) the percentage of variables with missing data for each case and (2) the number of cases with missing data for each variable. This simple process identifies not only the extent of missing data, but any exceptionally high levels of missing data that occur for individual cases or observations. The researcher should look for any nonrandom patterns in the data, such as concentration of missing data in a specific set of questions, attrition in not completing the questionnaire, and so on. Finally, the researcher should determine the number of cases with no missing data on any of the variables, which will provide the sample size available for analysis if remedies are not applied.

f it is determined that the extent is acceptably low and no specific nonrandom patterns appear, then the researcher can employ any of the imputation techniques (step 4) without biasing the results in any appreciable manner. If the level of missing data is too high, then the researcher must consider specific approaches to diagnosing the randomness of the missing data processes (step 3) before proceeding to apply a remedy

Deletions Based on Missing Data

Nguồn: Hair, J. F. (2009). Multivariate data analysis.

Imputation of Missing Data

Nguồn: Hair, J. F. (2009). Multivariate data analysis.

STEP 3: DIAGNOSE THE RANDOMNESS OF THE MISSING DATA PROCESSES

Levels of Randomness of the Missing Data Process

Missing At Random, or MAR

Missing data are termed missing at random (MAR) if the missing values of Y depend on X,
but not on Y. In other words, the observed Y values represent a random sample of the actual Y values for each value of X, but the observed data for Y do not necessarily represent a truly random sample of all Y values. Even though the missing data process is random in the sample, its values are not generalizable to the population. Most often, the data are missing randomly within subgroups, but differ in levels between subgroups. The researcher must determine the factors determining the subgroups and the varying levels between groups.

Missing Completely At Random, or MCAR

A higher level of randomness is termed missing completely at random (MCAR). In these instances the observed values of Y are truly a random sample of all Y values, with no underlying process that lends bias to the observed data. In simple terms, the cases with missing data are indistinguishable from cases with complete data.

Only MCAR allows for the use of any remedy desired. The distinction between these two levels is in the generalizability to the population

Diagnostic Tests for Levels of Randomness.

The first diagnostic assesses the missing data process of a single variable Y by forming two groups: observations with missing data for Y and those with valid values of Y. Statistical tests are then performed to determine whether significant differences exist between the two groups on other variables of interest. Significant differences indicate the possibility of a nonrandom missing data process.

A second approach is an overall test of randomness that determines whether the missing data can be classified as MCAR. This test analyzes the pattern of missing data on all variables and compares it with the pattern expected for a random missing data process. If no significant differences are found, the missing data can be classified as MCAR. If significant differences are found, however, the researcher must use the approaches described previously to identify the specific missing data processes that are nonrandom.

As a result of these tests, the missing data process is classified as either MAR or MCAR, which then determines the appropriate types of potential remedies. Even though achieving the level of MCAR requires a completely random pattern in the missing data, it is the preferred type because it allows for the widest range of potential remedies.

STEP 4: SELECT THE IMPUTATION METHOD

Imputation is the process of estimating the missing value based on valid values of other variables and/or cases in the sample. The objective is to employ known relationships that can be identified in the valid values of the sample to assist in estimating the missing values. However, the researcher should carefully consider the use of imputation in each instance because of its potential impact on the analysis

Comparison of Imputation Techniques for Missing Data

Nguồn: Hair, J. F. (2009). Multivariate data analysis.

All of the imputation methods discussed in this section are used primarily with metric variables; nonmetric variables are left as missing unless a specific modeling approach is employed. Nonmetric variables are not amenable to imputation because even though estimates of the missing data for metric variables can be made with such values as a mean of all valid values, no comparable measures are available for nonmetric variables. As such, nonmetric variables require an estimate of a specific value rather than an estimate on a continuous scale. It is different to estimate a missing value for a metric variable, such as an attitude or perception—even income—than it is to estimate the respondent’s gender when missing.

Nguồn:

Nguồn: Hair, J. F. (2009). Multivariate data analysis

Chủ Nhật, 24 tháng 3, 2024

Bài báo Toán học ngắn nhất

(Nguồn: https://www.openculture.com/2015/04/shortest-known-paper-in-a-serious-math-journal.html)

Nguồn tham khảo:

https://www.openculture.com/2015/04/shortest-known-paper-in-a-serious-math-journal.html
https://www.numberphile.com/videos/the-shortest-ever-papers

A CLASSIFICATION OF MULTIVARIATE TECHNIQUES (Hair, 2013)

A dependence technique may be defined as one in which a variable or set of variables is identified as the dependent variable to be predicted or explained by other variables known as independent variables. An example of a dependence technique is multiple regression analysis.

Một số kỹ thuật của "dependence technique"

(Nguồn: Hair, J. F. (2009). Multivariate data analysis. )

In contrast, an interdependence technique is one in which no single variable or group of variables is defined as being independent or dependent. Rather, the procedure involves the simultaneous analysis of all variables in the set. Factor analysis is an example of an interdependence technique.

TYPES OF MULTIVARIATE TECHNIQUES

Principal components and common factor analysis
Multiple regression and multiple correlation
Multiple discriminant analysis and logistic regression
Canonical correlation analysis
Multivariate analysis of variance and covariance
Conjoint analysis
Cluster analysis
Perceptual mapping, also known as multidimensional scaling
Correspondence analysis
Structural equation modeling and confirmatory factor analysis

Principal Components and Common Factor Analysis

Factor analysis, including both principal component analysis and common factor analysis, is a statistical approach that can be used to analyze interrelationships among a large number of variables and to explain these variables in terms of their common underlying dimensions (factors). The objective is to find a way of condensing the information contained in a number of original variables into a smaller set of variates (factors) with a minimal loss of information. By providing an empirical estimate of the structure of the variables considered, factor analysis becomes an objective basis for creating summated scales.

Multiple Regression

Multiple regression is the appropriate method of analysis when the research problem involves a single metric dependent variable presumed to be related to two or more metric independent variables. The objective of multiple regression analysis is to predict the changes in the dependent variable in response to changes in the independent variables. This objective is most often achieved through the statistical rule of least squares.

Multiple Discriminant Analysis and Logistic Regression

Multiple discriminant analysis (MDA) is the appropriate multivariate technique if the single dependent variable is dichotomous (e.g., male–female) or multi-chotomous (e.g., high–medium–low) and therefore nonmetric. As with multiple regression, the independent variables are assumed to be metric. Discriminant analysis is applicable in situations in which the total sample can be divided into groups based on a nonmetric dependent variable characterizing several known classes. The primary objectives of multiple discriminant analysis are to understand group differences and to predict the likelihood that an entity (individual or object) will belong to a particular class or group based on several metric independent variables

Logistic regression models, often referred to as logit analysis, are a combination of multiple regression and multiple discriminant analysis. This technique is similar to multiple regression analysis in that one or more independent variables are used to predict a single dependent variable. What distinguishes a logistic regression model from multiple regression is that the dependent variable is nonmetric, as in discriminant analysis. The nonmetric scale of the dependent variable requires differences in the estimation method and assumptions about the type of underlying distribution, yet in most other facets it is quite similar to multiple regression. Thus, once the dependent variable is correctly specified and the appropriate estimation technique is employed, the basic factors considered in multiple regression are used here as well. Logistic regression models are distinguished from discriminant analysis primarily in that they accommodate all types of independent variables (metric and nonmetric) and do not require the assumption of multivariate normality. However, in many instances, particularly with more than two levels of the dependent variable, discriminant analysis is the more appropriate technique.

Canonical Correlation

Canonical correlation analysis can be viewed as a logical extension of multiple regression analysis. With canonical analysis the objective is to correlate simultaneously several metric dependent variables and several metric independent variables. Whereas multiple regression involves a single dependent variable, canonical correlation involves multiple dependent variables. The underlying principle is to develop a linear combination of each set of variables (both independent and dependent) in a manner that maximizes the correlation between the two sets. Stated in a different manner, the procedure involves obtaining a set of weights for the dependent and independent variables that provides the maximum simple correlation between the set of dependent variables and the set of independent variables.

Multivariate Analysis of Variance and Covariance

Multivariate analysis of variance (MANOVA) is a statistical technique that can be used to simultaneously explore the relationship between several categorical independent variables (usually referred to as treatments) and two or more metric dependent variables. As such, it represents an extension of univariate analysis of variance (ANOVA). Multivariate analysis of covariance (MANCOVA) can be used in conjunction with MANOVA to remove (after the experiment) the effect of any uncontrolled metric independent variables (known as covariates) on the dependent variables. The procedure is similar to that involved in bivariate partial correlation, in which the effect of a third variable is removed from the correlation. MANOVA is useful when the researcher designs an experimental situation (manipulation of several nonmetric treatment variables) to test hypotheses concerning the variance in group responses on two or more metric dependent variables.

Conjoint Analysis

Conjoint analysis is an emerging dependence technique that brings new sophistication to the evaluation of objects, such as new products, services, or ideas. The most direct application is in new product or service development, allowing for the evaluation of complex products while maintaining a realistic decision context for the respondent. The market researcher is able to assess the importance of attributes as well as the levels of each attribute while consumers evaluate only a few product profiles, which are combinations of product levels.

Assume a product concept has three attributes (price, quality, and color), each at three possible levels (e.g., red, yellow, and blue). Instead of having to evaluate all 27 (3 * 3 * 3) possible combinations, a subset (9 or more) can be evaluated for their attractiveness to consumers, and the researcher knows not only how important each attribute is but also the importance of each level (e.g., the attractiveness of red versus yellow versus blue). Moreover, when the consumer evaluations are completed, the results of conjoint analysis can also be used in product design simulators, which show customer acceptance for any number of product formulations and aid in the design of the optimal product.

Cluster Analysis

Cluster analysis is an analytical technique for developing meaningful subgroups of individuals or objects. Specifically, the objective is to classify a sample of entities (individuals or objects) into a small number of mutually exclusive groups based on the similarities among the entities. In cluster analysis, unlike discriminant analysis, the groups are not predefined. Instead, the technique is used to identify the groups.

Perceptual Mapping

In perceptual mapping (also known as multidimensional scaling), the objective is to transform consumer judgments of similarity or preference (e.g., preference for stores or brands) into distances represented in multidimensional space. If objects A and B are judged by respondents as being the most similar compared with all other possible pairs of objects, perceptual mapping techniques will position objects A and B in such a way that the distance between them in multidimensional space is smaller than the distance between any other pairs of objects. The resulting perceptual maps show the relative positioning of all objects, but additional analyses are needed to describe or assess which attributes predict the position of each object.

As an example of perceptual mapping, let’s assume an owner of a Burger King franchise wants to know whether the strongest competitor is McDonald’s or Wendy’s. A sample of customers is given a survey and asked to rate the pairs of restaurants from most similar to least similar. The results show that the Burger King is most similar to Wendy’s, so the owners know that the strongest competitor is the Wendy’s restaurant because it is thought to be the most similar. Follow-up analysis can identify what attributes influence perceptions of similarity or dissimilarity.

Correspondence Analysis

Correspondence analysis is a recently developed interdependence technique that facilitates the perceptual mapping of objects (e.g., products, persons) on a set of nonmetric attributes. Researchers are constantly faced with the need to “quantify the qualitative data” found in nominal variables. Correspondence analysis differs from the interdependence techniques discussed earlier in its ability to accommodate both nonmetric data and nonlinear relationships. In its most basic form, correspondence analysis employs a contingency table, which is the cross-tabulation of two categorical variables. It then transforms the nonmetric data to a metric level and performs dimensional reduction (similar to factor analysis) and perceptual mapping.

Correspondence analysis provides a multivariate representation of interdependence for nonmetric data that is not possible with other methods. As an example, respondents’ brand preferences can be cross-tabulated on demographic variables (e.g., gender, income categories, occupation) by indicating how many people preferring each brand fall into each category of the demographic variables. Through correspondence analysis, the association, or “correspondence,” of brands and the distinguishing characteristics of those preferring each brand are then shown in a two- or three-dimensional map of both brands and respondent characteristics. Brands perceived as similar are located close to one another. Likewise, the most distinguishing characteristics of respondents preferring each brand are also determined by the proximity of the demographic variable categories to the brand’s position.

Structural Equation Modeling and Confirmatory Factor Analysis

Structural equation modeling (SEM) is a technique that allows separate relationships for each of a set of dependent variables. In its simplest sense, structural equation modeling provides the appropriate and most efficient estimation technique for a series of separate multiple regression equations estimated simultaneously. It is characterized by two basic components: (1) the structural model and (2) the measurement model. The structural model is the path model, which relates independent to dependent variables. In such situations, theory, prior experience, or other guidelines enable the researcher to distinguish which independent variables predict each dependent variable. Models discussed previously that accommodate multiple dependent variables—multivariate analysis of variance and canonical correlation—are not applicable in this situation because they allow only a single relationship between dependent and independent variables.

The measurement model enables the researcher to use several variables (indicators) for a single independent or dependent variable. For example, the dependent variable might be a concept represented by a summated scale, such as self-esteem. In a confirmatory factor analysis the researcher can assess the contribution of each scale item as well as incorporate how well the scale measures the concept (reliability). The scales are then integrated into the estimation of the relationships between dependent and independent variables in the structural model. This procedure is similar to performing a factor analysis (discussed in a later section) of the scale items and using the factor scores in the regression.

Nguồn:

Hair, J. F. (2009). Multivariate data analysis.

Thứ Sáu, 22 tháng 3, 2024

Visual Summaries of book "Introduction to Statistics and Data Analysis - Heumann et al"

Descriptive Data Analysis

Summary of Tests for Continuous and Ordinal Variables

(Part a)

(Nguồn b)

(Part c)

Summary of Tests for Nominal Variables

Nguồn: Heumann et al., Introduction to Statistics and Data Analysis, Springer International Publishing Switzerland, 2016.

DOI 10.1007/978-3-319-46162-5

Thứ Tư, 20 tháng 3, 2024

Lịch năm 2024 trùng với lịch 1996

Năm 2024 có lịch trùng với năm 1996

(Nguồn)

Điều này có thể lý giải bằng quy tắc toán học. Năm 2024 và năm 1996 có cùng 2 đặc điểm: ngày 1-1 rơi vào thứ Hai và đều là năm nhuận (tháng 2 có ngày 29).

Làm một phép tính đơn giản: lấy 365 (số ngày trong năm thường) chia cho 7 (số ngày trong tuần) dư 1. Vậy nên sau mỗi năm, ngày dương lịch sẽ tăng một thứ. Ví dụ: ngày 1-1 của năm 2020 là thứ Hai, sang 2021 là thứ Ba.

Sau 7 năm, lịch lại trùng nhau về thứ trong tuần. Tuy nhiên, vì trong năm nhuận có thêm 1 ngày nên quá trình "tăng một thứ" nêu trên lại được đẩy nhanh 1 ngày. Điều này khiến cho chu kỳ lặp lại diễn ra chỉ sau 5 hoặc 6 năm (tùy vào việc có 1 hay 2 ngày nhuận chen vào giữa).

Sau khi kết hợp 4 năm nhuận với 7 ngày một tuần, hai năm nhuận cách nhau 28 năm sẽ có lịch giống nhau. Như vậy, năm 1996 cũng không phải là năm duy nhất có lịch giống y hệt năm 2024. Theo trang Time and Date, còn 5 năm nhuận khác sẽ lịch giống năm 2024, đó là 1940, 1968, 2052, 2080 và 2120.

Tham khảo:

https://tuoitre.vn/lich-nam-2024-trung-lich-nam-1996-co-la-khong-20240106080036351.htm
https://muctim.tuoitre.vn/lich-nam-2024-va-1996-giong-nhau-su-ky-la-co-quy-tac-101240106121720021.htm

Blog bắt đầu hoạt động từ ngày 29 tháng 01 năm 2023, tức mùng 8 tết Quý Mão. Trên blog này, mỗi hai ngày sẽ đăng một bài mới về một kiến thức nhỏ mà Tr Anh đọc được. Chỉ là những thông tin tổng hợp từ những bài viết khác, hoặc trích từ một quyển sách nào đó, đôi khi là chỉ là một đoạn dịch lại từ internet, có thể những thông tin này còn thô sơ và chưa đầy đủ, nhưng Tr Anh để ở đây để chính mình học hỏi thêm. Tr Anh tin vào sức mạnh của sự kiên trì, mỗi ngày một bước chân, và "Hành trình vạn dặm bắt đầu từ những bước chân đầu đầu tiên" (Lão Tử)

Mỗi ngày một điều mới cùng Tr Anh

Thứ Bảy, 30 tháng 3, 2024

OUTLIERS (Hair, 2013)

Thứ Năm, 28 tháng 3, 2024

A RECAP OF THE MISSING VALUE ANALYSIS (Hair, 2013)

Thứ Ba, 26 tháng 3, 2024

Missing Data (Hair, 2013)

Chủ Nhật, 24 tháng 3, 2024

Bài báo Toán học ngắn nhất

A CLASSIFICATION OF MULTIVARIATE TECHNIQUES (Hair, 2013)

Thứ Sáu, 22 tháng 3, 2024

Visual Summaries of book "Introduction to Statistics and Data Analysis - Heumann et al"

Thứ Tư, 20 tháng 3, 2024

Lịch năm 2024 trùng với lịch 1996

John Bates Clark Medal

Theo thời gian

Chủ đề