ValueError: Found input variables with inconsistent numbers of samples: [143, 426]

By Jennifer, 8 months ago
  • Bookmark
0

How can I fix this error it throws? ValueError: Found input variables with inconsistent numbers of samples:[143, 426]


#split the data set into independent (X) and dependent (Y) data sets
X = df.iloc[:,2:31].values
Y = df.iloc[:,1].values

#split the data qet into 75% training and 25% testing
X_train, X_test, Y_train, Y_test = train_test_split(X, Y, test_size = 0.25, random_state = 0)

#scale the data (feature scaling)
sc = StandardScaler()
X_train = sc.fit_transform(X_train)
X_train = sc.fit_transform(X_test)

#Using Logistic Regression Algorithm to the Training Set

classifier = LogisticRegression(random_state = 0)
classifier.fit(X_train, Y_train)

and the shape of X_train, Y_train:

X_train.shape
(143, 29)
Y_train.shape
(426,)

error msg: ValueError Traceback (most recent call last) in () 2 3 classifier = LogisticRegression(random_state = 0) ----> 4 classifier.fit(X_train, Y_train) 5 #Using KNeighborsClassifier Method of neighbors class to use Nearest Neighbor algorithm 6


2 frames /usr/local/lib/python3.7/dist-packages/sklearn/utils/validation.py in check_consistent_length(*arrays) 210 if len(uniques) > 1: 211 raise ValueError("Found input variables with inconsistent numbers of" --> 212 " samples: %r" % [int(l) for l in lengths]) 213 214

ValueError: Found input variables with inconsistent numbers of samples: [143, 426]

Python
Machine-learning
2 Answers
0

You have a bug at line 11 where you are assigning to X_train instead of X_test. Take a look at the corrected code below.

#split the data set into independent (X) and dependent (Y) data sets
X = df.iloc[:,2:31].values
Y = df.iloc[:,1].values

#split the data qet into 75% training and 25% testing
X_train, X_test, Y_train, Y_test = train_test_split(X, Y, test_size = 0.25, random_state = 0)

#scale the data (feature scaling)
sc = StandardScaler()
X_train = sc.fit_transform(X_train)
X_test = sc.transform(X_test)

#Using Logistic Regression Algorithm to the Training Set

classifier = LogisticRegression(random_state = 0)
classifier.fit(X_train, Y_train)

Also do not use fit_transform on X_test. You won't use the same mean and std as that calculated in X_train.

0
Gilbertcane

Sounds like the shapes of your labels and predictions are not in alignment. I faced a similar problem while fitting a regression model . The problem in my case was, Number of rows in X was not equal to number of rows in y. In most case, x as your feature parameter and y as your predictor. But your feature parameter should not be 1D. So check the shape of x and if it is 1D, then convert it from 1D to 2D.


x.reshape(-1,1)


Also, you likely get problems because you remove rows containing nulls in X_train and y_train independent of each other. y_train probably has few, or no nulls and X_train probably has some. So when you remove a row in X_train and the same row is not removed in y_train it will cause your data to be unsynced and have different lenghts. Instead you should remove nulls before you separate X and y.



Your Answer

Webinars

More webinars

Related Discussions

Running random forest algorithm with one variable

View More