R (Error): x argument is missing, with no default - General

link管理
链接快照平台
输入网页链接，自动生成快照
标签化管理网页链接
相关文章推荐
卖萌的烤面包 · 浅谈管理层收购对企业发展的利弊分析_正保会计网校· 2 周前 ·
豁达的板栗 · 曹操家族DNA完全確定 ...· 7 月前 ·
鬼畜的水龙头 · 辩论赛- 知乎· 1 年前 ·
宽容的山楂 · 视频｜聊城高新区举行幼儿园校园文化观摩· 1 年前 ·
机灵的乌冬面 · 好胆你就麦造_百度百科· 1 年前 ·
I am working with R.
I have some data ("train_data") below:
# create some data for this example
a1 = rnorm(1000,100,10)
b1 = rnorm(1000,100,5)
c1 = sample.int(1000, 1000, replace = TRUE)
train_data = data.frame(a1,b1,c1)
#view data
head(train_data)
         a1        b1  c1
1 110.36832  90.66670 662
2  96.28321 102.68244 810
3 101.95640  98.17639 956
4 121.58001  93.04896 697
5  95.08541 104.64527 712
In this example, I am interested in performing an arbitrary task:
Choose 7 random numbers ( "random_1" (between 80 and 120), "random_2" (between "random_1" and 120) , "random_3" (between 85 and 120), "random_4" (between random_2 and 120), "split_1" (between 0 and 1), "split_2" (between 0 and 1), "split_3" (between 0 and 1 ))
Using these random numbers, perform a series of data manipulation procedures on "train_data" (these data manipulation procedures will be defined in the function below). For a specific set of 7 numbers, these data manipulation procedures will calculate a "total" mean.
Repeat steps 1) and 2) and see if you can find the set of these 7 numbers that produce the biggest value of the "total" mean.
Thus, I am trying to perform an optimization task.
Earlier, I was able to solve this task using a "random search":
# code for random search
results_table <- data.frame()
for (i in 1:10 ) {
    #generate random numbers
    random_1 =  runif(1, 80, 120)
    random_2 =  runif(1, random_1, 120)
    random_3 =  runif(1, 85, 120)
    random_4 =  runif(1, random_3, 120)
    #bin data according to random criteria
    train_data <- train_data %>% mutate(cat = ifelse(a1 <= random_1 & b1 <= random_3, "a", ifelse(a1 <= random_2 & b1 <= random_4, "b", "c")))
    train_data$cat = as.factor(train_data$cat)
    #new splits
    a_table = train_data %>%
        filter(cat == "a") %>%
        select(a1, b1, c1, cat)
    b_table = train_data %>%
        filter(cat == "b") %>%
        select(a1, b1, c1, cat)
    c_table = train_data %>%
        filter(cat == "c") %>%
        select(a1, b1, c1, cat)
    split_1 =  runif(1,0, 1)
    split_2 =  runif(1, 0, 1)
    split_3 =  runif(1, 0, 1)
    #calculate random quantile ("quant") for each bin
    table_a = data.frame(a_table%>% group_by(cat) %>%
                             mutate(quant = quantile(c1, prob = split_1)))
    table_b = data.frame(b_table%>% group_by(cat) %>%
                             mutate(quant = quantile(c1, prob = split_2)))
    table_c = data.frame(c_table%>% group_by(cat) %>%
                             mutate(quant = quantile(c1, prob = split_3)))
    #create a new variable ("diff") that measures if the quantile is bigger tha the value of "c1"
    table_a$diff = ifelse(table_a$quant > table_a$c1,1,0)
    table_b$diff = ifelse(table_b$quant > table_b$c1,1,0)
    table_c$diff = ifelse(table_c$quant > table_c$c1,1,0)
    #group all tables
    final_table = rbind(table_a, table_b, table_c)
    #create a table: for each bin, calculate the average of "diff"
    final_table_2 = data.frame(final_table %>%
                                   group_by(cat) %>%
                                   summarize(
                                       mean = mean(diff)
    #add "total mean" to this table
    final_table_2 = data.frame(final_table_2 %>% add_row(cat = "total", mean = mean(final_table$diff)))
    #format this table: add the random criteria to this table for reference
    final_table_2$random_1 = random_1
    final_table_2$random_2 = random_2
    final_table_2$random_3 = random_3
    final_table_2$random_4 = random_4
    final_table_2$split_1 = split_1
    final_table_2$split_2 = split_2
    final_table_2$split_3 = split_3
    final_table_2$iteration_number = i
    results_table <- rbind(results_table, final_table_2)
    final_results = dcast(setDT(results_table), iteration_number + random_1 + random_2 + random_3 + random_4 + split_1 + split_2 + split_3 ~ cat, value.var = 'mean')
#keep 5 largest resuts
  final_results <- head(final_results[order(-total)], 5)
Now, we can view the results of the random search:
 #view results
final_results
   iteration_number  random_1 random_2  random_3  random_4    split_1   split_2   split_3         a         b         c total
1:                8 104.52182 104.8939  96.63609  99.14640 0.45389635 0.7970865 0.8264969 0.4560440 0.7954545 0.8265306 0.755
2:               10 119.04797 119.9907  93.13250  93.62925 0.27018809 0.5025505 0.6707737 0.2758621 0.5000000 0.6681465 0.632
3:                1 114.69535 117.7922 109.89274 116.39624 0.61857197 0.9609914 0.2661892 0.6180022 0.9615385 0.2702703 0.623
4:                6  85.64905 100.8127  94.02205 106.41212 0.00197946 0.7476889 0.1235777 0.2500000 0.7470588 0.1234568 0.442
5:                3 106.14908 119.7681  95.61753 100.73192 0.20678470 0.1787206 0.7166830 0.2111801 0.1802030 0.7146067 0.423
According to the above table (for a very small random search of 10 iterations), the combination of " random_1, random_2, random_3, random_4, split_1, split_2, split_3 " = ( 104.52182 104.8939 96.63609 99.14640 0.45389635 0.7970865 0.8264969) produces the highest "total" of 0.755 .
My Problem: The "random search" is not a very effective way at solving this problem. I am trying to use a different optimization algorithm to try and identify a set of random_1, random_2, random_3, random_4, split_1, split_2, split_3 that produces the biggest value of total .
From the following link (A quick tour of GA), I decided to follow the example for optimizing this problem using an optimization algorithm called the "genetic algorithm":
#example of the genetic algorithm
#load library
library(GA)
#define function
Rastrigin <- function(x1, x2)
  20 + x1^2 + x2^2 - 10*(cos(2*pi*x1) + cos(2*pi*x2))
x1 <- x2 <- seq(-5.12, 5.12, by = 0.1)
f <- outer(x1, x2, Rastrigin)
#run optimization algorithm
GA <- ga(type = "real-valued", 
         fitness =  function(x) -Rastrigin(x[1], x[2]),
         lower = c(-5.12, -5.12), upper = c(5.12, 5.12), 
         popSize = 50, maxiter = 1000, run = 100)
#view results of the genetic algorithm (the answer that optimizes the function in this example is (x1 = 5.4 e-05, x2 = 6.400 e-05)
summary(GA)
              x1           x2
[1,] 5.41751e-05 6.400989e-05
I now want to apply the "genetic algorithm" to my problem. This requires the user to define a "fitness function" that formalizes the requirements and directions for the "genetic algorithm". For my problem, I defined the "fitness function" as follows:
#define fitness function
fitness <- function(random_1, random_2, random_3, random_4, split_1, split_2, split_3) {
    #bin data according to random criteria
    train_data <- train_data %>% mutate(cat = ifelse(a1 <= random_1 & b1 <= random_3, "a", ifelse(a1 <= random_2 & b1 <= random_4, "b", "c")))
    train_data$cat = as.factor(train_data$cat)
    #new splits
    a_table = train_data %>%
        filter(cat == "a") %>%
        select(a1, b1, c1, cat)
    b_table = train_data %>%
        filter(cat == "b") %>%
        select(a1, b1, c1, cat)
    c_table = train_data %>%
        filter(cat == "c") %>%
        select(a1, b1, c1, cat)
    split_1 =  runif(1,0, 1)
    split_2 =  runif(1, 0, 1)
    split_3 =  runif(1, 0, 1)
    #calculate  quantile ("quant") for each bin
    table_a = data.frame(a_table%>% group_by(cat) %>%
                             mutate(quant = quantile(c1, prob = split_1)))
    table_b = data.frame(b_table%>% group_by(cat) %>%
                             mutate(quant = quantile(c1, prob = split_2)))
    table_c = data.frame(c_table%>% group_by(cat) %>%
                             mutate(quant = quantile(c1, prob = split_3)))
    #create a new variable ("diff") that measures if the quantile is bigger tha the value of "c1"
    table_a$diff = ifelse(table_a$quant > table_a$c1,1,0)
    table_b$diff = ifelse(table_b$quant > table_b$c1,1,0)
    table_c$diff = ifelse(table_c$quant > table_c$c1,1,0)
    #group all tables
    final_table = rbind(table_a, table_b, table_c)
# calculate the total mean : this is what needs to be optimized
    mean = mean(final_table$diff)
Just to test that this function works:
#call function for a specific set of the 7 numbers 
 a = fitness(85, 100, 90, 110, 0.5, 0.7, 0.3)
# view the corresponding "total mean"
[1] 0.845
Now, I am trying to put everything together and instruct the "genetic algorithm" to optimize the "fitness function" I defined by considering different ranges of values for "random_1, random_2, random_3, random_4, split_1, split_2, split_3"
#genetic algorithm for my example:
GA <- ga(type = "real-valued", 
         fitness =  fitness,
         lower = c(80, 80, 80, 80, 0,0,0), upper = c(120, 120, 120, 120, 1,1,1), 
         popSize = 50, maxiter = 1000, run = 100)
But this produces the following error:
Error: Problem with `mutate()` column `cat`.
i `cat = ifelse(...)`.
x argument "random_3" is missing, with no default
Run `rlang::last_error()` to see where the error occurred.
In addition: Warning message:
 Error: Problem with `mutate()` column `cat`.
i `cat = ifelse(...)`.
x argument "random_3" is missing, with no default
Run `rlang::last_error()` to see where the error occurred. 
Does anyone know why this error is being produced? Can someone please show me what I am doing wrong?
Thanks
              Your first fitness function has one argument: a vector.

Your second fitness function has seven arguments: all scalars.

And note that you don't use the last three split* arguments:

you redefine them in your function.
              I am working with R. I am learning about how to optimize functions and estimate the maximum or minimum points of these functions.
For example, I created some random data ("train data):
#load libraries
library(dplyr)
# create some data for this example
a1 = rnorm(1000,100,10)
b1 = rnorm(1000,100,5)
c1 = sample.int(1000, 1000, replace = TRUE)
train_data = data.frame(a1,b1,c1)
I also created the following function ("fitness") that takes seven inputs ( "random_1" (between 80 and 120), "random_2" (between "random_1" and 120) , "random_3" (between 85 and 120), "random_4" (between random_2 and 120), "split_1" (between 0 and 1), "split_2" (between 0 and 1), "split_3" (between 0 and 1 )), , performs a series of data manipulation procedures and returns a "total" mean:
fitness <- function(random_1, random_2, random_3, random_4, split_1, split_2, split_3) {
    #bin data according to random criteria
    train_data <- train_data %>% mutate(cat = ifelse(a1 <= random_1 & b1 <= random_3, "a", ifelse(a1 <= random_2 & b1 <= random_4, "b", "c")))
    train_data$cat = as.factor(train_data$cat)
    #new splits
    a_table = train_data %>%
        filter(cat == "a") %>%
        select(a1, b1, c1, cat)
    b_table = train_data %>%
        filter(cat == "b") %>%
        select(a1, b1, c1, cat)
    c_table = train_data %>%
        filter(cat == "c") %>%
        select(a1, b1, c1, cat)
    split_1 =  runif(1,0, 1)
    split_2 =  runif(1, 0, 1)
    split_3 =  runif(1, 0, 1)
    #calculate  quantile ("quant") for each bin
    table_a = data.frame(a_table%>% group_by(cat) %>%
                             mutate(quant = quantile(c1, prob = split_1)))
    table_b = data.frame(b_table%>% group_by(cat) %>%
                             mutate(quant = quantile(c1, prob = split_2)))
    table_c = data.frame(c_table%>% group_by(cat) %>%
                             mutate(quant = quantile(c1, prob = split_3)))
    #create a new variable ("diff") that measures if the quantile is bigger tha the value of "c1"
    table_a$diff = ifelse(table_a$quant > table_a$c1,1,0)
    table_b$diff = ifelse(table_b$quant > table_b$c1,1,0)
    table_c$diff = ifelse(table_c$quant > table_c$c1,1,0)
    #group all tables
    final_table = rbind(table_a, table_b, table_c)
# calculate the total mean : this is what needs to be optimized
    mean = mean(final_table$diff)
Just as a sanity check, we can verify that this function actually works:
#testing the function at some specific input:
 a <- fitness(80,80,80,80,0.6,0.2,0.9)
[1] 0.899
Now, using the following reference on optimization with R (https://cran.r-project.org/web/packages/optimization/optimization.pdf and https://cran.r-project.org/web/packages/optimization/vignettes/vignette_master.pdf), I am trying to perform some common optimization techniques on this function.
For example:
#load library
library(optimization)
Nelder-Meade Optimization with an Initial Guess:
optim_nm(fitness, start = c(80,80,80,80,0,0,0))
Nelder-Meade Optimization with fixed parameters:
optim_nm(fun = fitness, k = 2)
Optimization using Simulated Annealing:
ro_sa <- optim_sa(fun = fitness,
start = c(runif(7, min = -1, max = 1)),
lower = c(80,80,80,80,0,0,0),
upper = c(120,120,120,120,1,1,1),
trace = TRUE,
control = list(t0 = 100,
nlimit = 550,
t_min = 0.1,
dyn_rf = FALSE,
rf = 1,
r = 0.7
But all of these procedures return a similar error:
Error: Problem with `mutate()` column `cat`.
i `cat = ifelse(...)`.
x argument "random_3" is missing, with no default
Run `rlang::last_error()` to see where the error occurred.
In addition: Warning message:
 Error: Problem with `mutate()` column `cat`.
i `cat = ifelse(...)`.
x argument "random_3" is missing, with no default
Run `rlang::last_error()` to see where the error occurred. 
And this is preventing me from visualizing the results of these optimization algorithms :
#code for visualizations
plot(ro_sa)
 plot(ro_sa, type = "contour")
Can someone please show me what am I doing wrong? Is it possible to fix this?
Thanks
              Han identified your key issue, I would guess he bowed out of the conversation because your response did not address his posting, but seemed to be a restatement of your initial post...
That said, here is some code that you can run and reflect on how it might be relevant to your scenario.
fitness1 <- function(a,b,c,d,e,f,g){paste0(a,b,c,d,e,f,g)}
fitness1(80,80,80,80,0.6,0.2,0.9)
fitness1(c(80,80,80,80,0.6,0.2,0.9))
fitness2 <- function(v){paste0(v)}
fitness2(80,80,80,80,0.6,0.2,0.9)
fitness2(c(80,80,80,80,0.6,0.2,0.9))
which of these fitness*() calls error and which don't, and why ?