I hate to do this, but I just can't figure this out. I'm trying to merge two sets of data, one of them is a year by year panel of corporate info and the other is a day by day stock return for the firms. The rub is that I need only those years of data for the firms which are listed in the corporate year by year file. (I was forced to download all years for the list, hence my problem)
I've tried various merges that I am familiar with with and without (in=) variables. I can't seem to find this answer online so I"m asking for help.
FILE1
cusipid year
0001 1995
0001 1996
0002 2000
0002 2001
0002 2002
0003 1993
FILE2
cusipid date year daily_ret
0001 12/31/1994 1994 .04
0001 01/01/1995 1995 .03
...
0002 01/01/2000 2000 .02
The file should include all the FILE2 records that have the same year as the cusipid in the first file. This is the simplest version of what I have done:
proc sort data=file1; by cusipid year; run;
proc sort data=file2; by cusipid year; run;
data endfile; merge file1 file2; by cusipid year; run;
- - - - - - - - - -
I can't seem to be able to drop the observations that do not have a matching year. I almost assume that the issue is that the first merge variable forces them to be included. should I run a by variable first?
increasingly frustrated with the lack of simplicity in my mind,
vikingo
I've tried various merges that I am familiar with with and without (in=) variables. I can't seem to find this answer online so I"m asking for help.
FILE1
cusipid year
0001 1995
0001 1996
0002 2000
0002 2001
0002 2002
0003 1993
FILE2
cusipid date year daily_ret
0001 12/31/1994 1994 .04
0001 01/01/1995 1995 .03
...
0002 01/01/2000 2000 .02
The file should include all the FILE2 records that have the same year as the cusipid in the first file. This is the simplest version of what I have done:
proc sort data=file1; by cusipid year; run;
proc sort data=file2; by cusipid year; run;
data endfile; merge file1 file2; by cusipid year; run;
- - - - - - - - - -
I can't seem to be able to drop the observations that do not have a matching year. I almost assume that the issue is that the first merge variable forces them to be included. should I run a by variable first?
increasingly frustrated with the lack of simplicity in my mind,
vikingo