python - Splitting numpy arrays based on categorical variable -
i'm trying split ages , weights based on categorical variable "obese" , plot 2 sets different colors. think might doing list comprehension wrong. when plot see 1 color , data points.
import numpy np import matplotlib.pyplot plt ages = np.array([20, 22, 23, 25, 27]) weights = np.array([140, 144, 150, 156, 160]) obese = np.array([0, 0, 0, 1, 1]) ages_normal = [ages in range(0, len(obese)) if obese[i] == 0] weights_normal = [weights in range(0, len(obese)) if obese[i] == 0] ages_obese = [ages in range(0, len(obese)) if obese[i] == 1] weights_obese = [weights in range(0, len(obese)) if obese[i] == 1] plt.scatter(ages_normal, weights_normal, color = "b") plt.scatter(ages_obese, weights_obese, color = "r") plt.show()
i'd like:
import numpy np import matplotlib.pyplot plt ages = np.array([20, 22, 23, 25, 27]) weights = np.array([140, 144, 150, 156, 160]) obese = np.array([0, 0, 0, 1, 1]) data = zip(ages, weights, obese) data_normal = np.array([(a,w) (a,w,o) in data if o == 0]) data_obese = np.array([(a,w) (a,w,o) in data if o == 1]) plt.scatter(data_normal[:,0], data_normal[:,1], color = "b") plt.scatter(data_obese[:,0], data_obese[:,1], color = "r") plt.show()
but might more efficient:
data = np.array(np.vstack([ages, weights, obese])).t ind_n = np.where(data[:,2] == 0) ind_o = np.where(data[:,2] == 1) plt.scatter(data[ind_n,0], data[ind_n,1], color = "b") plt.scatter(data[ind_o,0], data[ind_o,1], color = "r")
but correct, list comprehensions bit off, maybe wanted like:
ages_normal = [ages[i] in range(0, len(obese)) if obese[i] == 0] weights_normal = [weights[i] in range(0, len(obese)) if obese[i] == 0] ages_obese = [ages[i] in range(0, len(obese)) if obese[i] == 1] weights_obese = [weights[i] in range(0, len(obese)) if obese[i] == 1]
the difference added indexing on ages
/weights
.
all 3 approachs generates graph you're looking for.
Comments
Post a Comment