Dear all,
I am trying to do something conceptually fairly simple. I would like to create a group variable which tells me in which quartile an observation falls into according to the value of a variable. I have tried to do that in this way:
by group year: xtile quant=x, nq(4)
by it didn't work. I have also tried a bunch of similar codes but none seemed to be effective. The problem is that none of the functions computing quantiles accepts the
by
option.... Can someone suggest something better?
Many thanks!
Riccardo
Hi Riccardo,
There are two options, you can either use a loop:
* lets assume your year goes from 1990 to 2000
gen quant=.
forvalues i=1990/2000 {
capture drop xq
xtile xq=x if year==`i', nq(4)
replace quant=x if year==`i'
The other is to install egenmore (ssc install egenmore), and use the following:
egenmore quant=xtile(x), n(4) by(year)
I think both are equally efficient, so since you are doing this by year and ID, you should expect it to take a relatively long time to run the commands.
Fernando
Dear all,
My aim is to generate quintiles of a continuous variable (alcohol use/g; variable name: alc) by sex (variable: sex). The range in alc is [0, 1700].
Are there any other ways of doing this very simple task other than suggested by Nick and Fernando above? I am finding it pretty hard to follow Fernando´s suggestion applied to my variables (since I am pretty new to Stata), and I am using Stata at our institute where downloading packages online is either very hard or not allowed. I am using StataSE 16.0. Any advice would be appreciated.
Thank you!
| Car type
qmpg | Domestic Foreign | Total
-----------+----------------------+----------
1 | 13 5 | 18
2 | 9 5 | 14
3 | 11 5 | 16
4 | 11 3 | 14
5 | 8 4 | 12
-----------+----------------------+----------
Total | 52 22 | 74
This is the same method as that of @Fernardo Rios in #2, except that there's a typo in his main code segment (and manifestly, he's using 4 bins, not 5).
Code:
gen quant=.
forvalues i=1990/2000 {
capture drop xq
xtile xq=x if year==`i', nq(4)
replace quant=xq if year==`i'
However, for women (sex==2) there was zero observations in the second quintile whereas 1,908 (48 % of all for women) observations in the first quintile (see below). I suspect this might be due to the skewness of the original alcohol variable.
Code:
#9 Those quintile bins look fairly useless in practice. If you need to categorise at all, I would use the values of your variable directly.
More discussion at https://www.stata-journal.com/articl...article=dm0095
https://www.stata-journal.com/articl...article=pr0054
Nick, when I run
egen quant=xtile(x), n(4) by(year)
it says tile not found. I am running Stata 14.2 SE! I downloaded egenmore again but its still says unknown function xtile(). I even tried running
xtile interd_quartile = interd, n(4) by (career_year)
but it lead to the error: option by not allowed
#11 Installing egenmore from SSC will have precisely no consequences for xtile. It will not mean that xtile now supports a by() option. So your last problem report is not at all surprising.
But you say that you downloaded egenmore "again". From your post I can only guess that you put the files in the wrong place.
A correct installation will mean that Stata can see a file _gxtile.ado so that asking which will show you something like this:
Code:
. which _gxtile.ado
c:\ado\plus\_\_gxtile.ado
*! _gxtile version 1.2 UK 08 Mai 2006
It doesn't matter if your location is different: for example, you may not be using Windows, or your set-up may vary otherwise. What does matter is that Stata can find that program file to use it.
What happens sometimes is that people install files with their browser and put them in the wrong place. Or something dopey happens, such as the files acquire an irrelevant extension .html.
Note that https://ideas.repec.org/c/boc/bocode/s386401.html explicitly advises
| Car type
wanted | Domestic Foreign | Total
-----------+----------------------+----------
1 | 30 11 | 41
2 | 22 11 | 33
-----------+----------------------+----------
Total | 52 22 | 74
Here is the code all in one for convenience:
Code:
sysuse auto, clear
ssc install egenmore, replace
egen wanted = xtile(mpg), n(2) by(foreign)
tab wanted foreign
Dear all,
I have a fairly simple task: I need to generate a new variable based on tertiles of the variable in interest, taking into account the classes of two different variables. In my case, I need to generate education tertiles (based on years of schooling) taking into account both sex and the year of birth. I have done this taking into account only year of birth as follows:
Code:
My question is, how to add gender on the by-option? Many thanks for any tips!
Best regards,
Dear Nick,
Thanks for a prompt answer! My intuition does not always match to that of Stata's (I was trying with commas between the variables, still learning...).
A follow-up: when generating the new variable, I should assing ties of the values to means. In SPSS, this is done by default (see
here
)
:
"First, the observations are ordered and given unique, sequential ranks. Then, tied observations have their assigned ranks averaged together." Could this be done in Stata?
Best,