Studying the Court of Justice of the European Union in R
This is an introductory guide to studying the Court of Justice of the European Union (CJEU) in R, using data from the IUROPA CJEU Database.
It kicks off with a step by step guide describing how to load data from the database into R, before providing some advice on how to manage the data and concluding with some examples of how it can be visualised.
This is not intended to be a comprehensive guide for learning R, but rather a beginner's guide to studying the CJEU using the software. One resource that cannot be recommended enough for those wanting to learn R more properly is the brilliant and free book R for Data Science by Hadley Wickham, Mine Çetinkaya-Rundel, and Garrett Grolemund.
Table of contents:
- Accessing data
1.1. Installing the data base
1.2. Data tables
1.3. Download data - Data management
2.1. Combining data across data sets
2.2. Dealing with lists - Analysing data
3.1. Length of procedures over time
3.2. Number of judgments in policy area
3.3. Age of appointed judges - Conclusion
1. Accessing the data
In this first part I describe how to access data from the IUROPA project in R, a general overview over what the data contains, and how to download it.
1.1. Installing the IUROPA package
The easiest way of accessing data from the IUROPA project in R is by installing the dedicated R package developed by Joshua Fjelstul.
As the package is hosted on Github, a neat way to install it is through the devtools
package. If you do not have this package installed already, you can install it by running the following line:
install.packages("devtools")
As soon as the devtools
package is installed you can install the IUROPA package directly from Github:
devtools::install_github("jfjelstul/iuropa")
Once installed, the package can be loaded as follows:
library(iuropa)
1.2. Data tables
You are now ready to use the IUROPA package to load data on the CJEU directly into R. There are a total of 17 different data tables available for download, all of which can be described in further detail by running the following:
description <- describe_tables(component="cjeu_database_platform")
View(description)
You can find a comprehensive overview of the tables included in the data base and the variables within them in the code book. The most important tables are briefly described below, along with a selection of variables found within them. Click on an entry to reveal more information about it.
The cases
table contains all cases lodged before the CJEU.
cjeu_case_id
- The ID code of the case before the Court, assigned to a case at the moment it enters the docket. This is the ID code used by the Court. In older cases official sources will often list case numbers with digits only (for example 6/64); in our data base, we always include the prefix indicating the tribunal (C-6/64).
is_joined
- Binary variable: 1 if the case has been joined into another case, 0 if it has not. If
is_joined
is 1, decisions published in the case will be published under the case number listed under the variablejoined_to
. joined_to
- If the case has been joined, this lists the first case of the cases it was joined with.
case_name
- The name of the case, for example Costa v E.N.E.L..
court
- The court in which the procedure took place: The
Court of Justice
,General Court
, orCivil Service Tribunal
. case_year
- The year in which the case was initiated
is_pending
- Binary variable observing whether a case is completed or if it is still pending before the CJEU (as of the last update of the database).
is_removed
- Binary variable observing whether a case is removed from the register.
The judgments
table includes data on the judgments of the Court, excluding other types of decisions such as AG opinions and orders of the Court.
cjeu_case_id
- The ID code of the case before the Court. If multiple cases are joined, this variable lists the first of the joined cases.
ecli
- ECLI numbers are the official ID codes given by the CJEU to its decisions.
celex
- CELEX numbers are the official ID codes given to European legislation by EUR-lex. Most, but not all, judgments are given unique CELEX numbers, and they they are sometimes used as ID codes in data bases on the work of the CJEU.
decision_date
- The date of the judgment. The date will be returned in text format; use the
as.Date
function to convert it into a date. proceeding_date
- The date of the lodging of the first case of the procedures leading up to the judgment. The date will be returned in text format; use the
as.Date
function to convert it into a date. duration_days
- The duration of the procedure leading up to the decision, in days. Corresponds to the subtraction of
proceeding_date
formdecision_date
. court
- The court in which the procedure took place: The Court of Justice, General Court, or Civil Service Tribunal.
list_authentic_languages
- The authentic languages of the judgments, which generally correspond to the languages of the parties before the CJEU.
list_procedures
- The type of procedure before the court, along with a list of outcomes when applicable. References for preliminary rulings have no listed outcomes, as the Court does not decide the case as such but merely provides an answer to the question referred by the national court.
is_preliminary_ruling
- Binary variable observing whether the judgment is made in a preliminary ruling procedure.
is_direct_action
- Binary variable observing whether the judgment is made in a direct action.
is_appeal
- Binary variable observing whether the judgment is made in an appeal procedure.
list_referring_member_states
- A list of the member states of the referring court(s). There is usually only one referring member state, but there could be several in joined procedures. Separated by comma.
count_observers
- The number of observations submitted to the CJEU as part of the procedure.
list_observers
- The names of the parties making submissions to the CJEU in the procedure.
iuropa_judge_rapporteur_id
- The ID code of the judge serving as judge rapporteur in the judgment, corresponding to
iuropa_judge_id
in thejudges
data set. judge_rapporteur
- The surname of the judge serving as judge rapporteur in the judgment.
The decisions of the CJEU includes not only judgments, but also other documents published such as orders of the Court, orders of the President of the Court, and opinions of the advocates general (AG opinions). Each decision is identified by an unique "European Case Law Identifier", or ecli
for short. For example Costa v ENEL saw the publication of three decisions: An order of the Court (ECLI:EU:C:1964:51), an AG opinion (ECLI:EU:C:1964:51), and a judgment (ECLI:EU:C:1964:66).
cjeu_case_id
- The ID code of the case before the Court. If multiple cases are joined, this variable lists the first of the joined cases.
ecli
- ECLI numbers are the official ID codes given by the CJEU to its decisions.
celex
- CELEX numbers are the official ID codes given to European legislation by EUR-lex. The most important decisions of the CJEU are given unique CELEX numbers, and they they are sometimes used as ID codes in data bases on the work of the CJEU.
decision_type
- The type of decision, such as
AG opinion
,order
, orjudgment
. decision_date
- The date of the decision. The date will be returned in text format; use the
as.Date
function to convert it into a date. court
- The court in which the procedure took place: The Court of Justice, General Court, or Civil Service Tribunal.
This data table includes information about the judges, advocates generals, and registrars of the CJEU.
iuropa_judge_id
- An unique ID code for each judge of the CJEU, created for the purpose of the IUROPA database. Historically some judges have the same name and the names of the judges might be spelled in a variety of ways, making the judge ID a useful variable for working with the judges, registrars, and advocates general of the CJEU.
full_name
- The full name of the individual.
last_name
- The surname of the individual.
last_name_latin
- The surname of the individual, with accents and special characters removed.
start_date
- The date the person first started working for the CJEU.
end_date
- The date the person last worked for the CJEU.
member_state
- The appointing member state.
birth_year
- The year of birth of the individual.
is_female
- Binary variable: 1 if the judge is female, 0 if not.
was_judge
- Binary variable observing whether the individual worked as a judge prior to beginning at the CJEU.
was_academic
- Binary variable observing whether the individual worked in an academic position prior to beginning at the CJEU.
was_civil_servant
- Binary variable observing whether the individual worked as a civil servant prior to beginning at the CJEU.
was_lawyer
- Binary variable observing whether the individual worked as a lawyer prior to beginning at the CJEU.
was_politician
- Binary variable observing whether the individual worked as a politician prior to beginning at the CJEU.
As judges might have been working in different positions and for more than one of the tribunals making up the CJEU, more than one start and end date, as well as positions, might apply for a single individual. For more detailed information about the appointments to positions in the different tribunals of the CJEU, see the appointments
data set.
The appointments
data set observes the appointment of judges to various positions before the CJEU. Multiple observations can therefore be assigned to a single judge. The data set does not currently observe the reappointment of sitting judges.
iuropa_judge_id
- The ID code of the individual judge.
start_date
- The date the person first started working for the CJEU in the recorded position.
end_date
- The date the person last worked for the CJEU in the recorded position.
position
- The position of the judge at the CJEU:
judge
,registrar
,Advocate General
,President
, orVice-President
. duration_days
- The duration of the term served by the individual, in days.
The assignments
data set observes which judges sign the decisions of the CJEU. Each row represents one judge assigned to a decision. Generally, the Court is composed of one of three chamber compositions: A small chamber of three judges, a medium chamber of five or seven judges, or a grand chamber of 15 judges or more.
cjeu_case_id
- The ID code of the case before the Court. If multiple cases are joined, this variable lists the first of the joined cases.
ecli
- ECLI numbers are the official ID codes given by the CJEU to its decisions.
iuropa_judge_id
- The ID code of the given judge.
judge
- The name of the judge.
is_judge_rapporteur
- Binary variable observing whether the judge served as judge rapporteur.
The procedures
data set observes the procedures before the Court. The most common procedure types are references for preliminary rulings, actions for failure to fulfil obligations, actions for annulment, actions for failure to act, damages for non-contractual liability, appeals, and staff cases.
cjeu_case_id
- The ID code of the case before the Court. If multiple cases are joined, this variable lists the first of the joined cases.
ecli
- ECLI numbers are the official ID codes given by the CJEU to its decisions.
procedure
- The type of procedure. While there are a number of different procedures, the most common ones are, in decreasing order:
reference for a preliminary ruling
,action for annulment
,staff case
,action for failure to fulfill obligations
(aka. infringement procedures),appeal
,action for damages
, andappeal against a penalty
. is_successful
- Binary variable observing whether the procedure is successful. Always 0 for preliminary references, as this is not a relevant question for these procedures.
is_unfounded
- Binary variable observing whether the Court declares a case to be unfounded.
is_inadmissible
- Binary variable observing whether the Court declares a case to be inadmissible.
This data set observes the parties before the cases of the CJEU. Each row represents a party to a case before the CJEU, along with the standardised type of actor such as member states or EU institutions.
cjeu_case_id
- The ID code of the case before the Court.
party_role
- The role of the party before the CJEU:
applicant
,defendant
, orlitigant
. party
- The name of the party.
party_type
- The type of party, such as
legal person
,EU institution
, andEU member state
.
The citations
table includes the references included in a decision of the Court.
cjeu_case_id
- The ID code of the case before the Court.
ecli
- The ECLI number of the decision in which the citations are found.
cited_celex
- The CELEX number of the cited legislation.
cited_type
- The type of decision cited, such as
treaty
,legislation
, orcase law
cited_subtype
- Offers more fine-grained type of citation types, such as
regulation
orjudgment (Court of Justice)
. cited_detail
- Observes information such as the cited article and paragraph.
The observers
table lists observations made by EU member states, EU institutions, EU agencies, EFTA member states, and EFTA institutions to preliminary reference procedures before the CJEU.
cjeu_case_id
- The ID code of the case before the Court.
ecli
- The ECLI number of the decision in which the citations are found.
observer
- The name of the party submitting the observation, for example
European Commission
orItaly
. observer_type
- The type of the party submitting the observation, for example
EU institution
orEU member state
.
1.3. Downloading data
Data can be downloaded using the download_data
function of the iuropa
package. Downloading the entire judges
data set can be done as follows:
judges <- download_data(
component="cjeu_database_platform",
table = "judges"
)
While feasible for a smaller data set such as the one listing the judges of the CJEU, downloading the entire data set all at once is not particularly efficient for larger data sets. A good idea can therefore be to specify some filters for which data to download.
In order to specify such filters, you must first decide which data you are interested in. One option is to consult the code book; another is to use the describe_variables
function to get familiar with the contents of a given table. For example, the following code downloads and views a description of all the variables in the decisions
data set:
description <- describe_variables(
component = "cjeu_database_platform",
table = "decisions"
)
View(description)
From this description we find that the decisions
data set contains 19 different variables, as well as what kind of values these variables contain. The variable court
may for example contain one of three different values: "Court of Justice", "General Court", or "Civil Service Tribunal". In contrast, other variables may contain a wide variety of values, such as cjeu_case_id
which lists the official ID code of the procedure a case is lodged in.
We can use this information to download a specific subset of the data. For example, we might be interested in downloading all decisions published in Costa v ENEL, which can be done by specifying the case ID of Costa: C-6/64. The following code loads the three decisions of the Court in Costa v ENEL into an object called costa
.
costa <- download_data(
component = "cjeu_database_platform",
table = "decisions",
filters = list(
cjeu_case_id = "C-6/64"
)
)
Instead of specifying a single case, a vector can be used to define several potential values. For example, substituting "C-6/64"
for c("C-26/62", "C-6/64")
will download a set of all decisions in both Costa v ENEL (C-6/64) and Van Gend en Loos (C-26/62).
As additional constraints, we may also define a list of variables to download, along with a filter containing a number of specifications. Let's say we are interested in downloading all judgments of the Court of Justice in cases lodged between 2010 and 2020. We furthermore are only interested in the case ID, the ECLI number, whether the procedure is a preliminary ruling, the judge rapporteur, subject matter, and the date of the judgments. We can specify this data selection as follows, downloading the data into an element named judgments
:
judgments <- download_data(
component = "cjeu_database_platform",
table = "judgments",
filters = list(
case_year = 2010:2019, # A vector: c(2010, 2011, ... 2019)
court = "Court of Justice"
),
variables = c(
"cjeu_case_id",
"ecli",
"decision_date",
"list_procedures", # List of procedure types
"is_preliminary_ruling", # If preliminary ruling
"list_subject_keywords", # Subject matters
"judge_rapporteur", # Name of the judge rapporteur
"iuropa_judge_rapporteur_id" # ID code of rapporteur
)
)
An advice is to always include ID codes observed at the unit of analysis you are downloading the data for. As this data is observed for the level of decisions, ECLI numbers provide an unique ID code for all observations. This will be useful if you need to include more variables later on, or observe which decision an observation represents.
2. Data management
Downloading the data is, of course, only the beginning. Organizing and making sense of the data is a major task, and one that will require a decent amount of work. Here I will present two things to get you started: The moving of variables from one data set to another through matching ID codes, and the working with list variables using the apply function in R.
2.1. Combining data across data sets
Let's say we are interested in the gender of the judge rapporteur drafting the judgments of the Court. This poses a challenge, as the gender of the judges of the CJEU is known from the judges
data set, but this information is not included directly in the judgments
set. In order to get the gender of the judge rapporteur, we therefore have to download this from the judges
data set.
We start out with the judgments
data set from section 1.3. First, we observe all unique ID codes of the judge rapporteurs (iuropa_judge_rapporteur_id
), which correspond to the iuropa_judge_id
variable in the judges
data set. We store all the observed ID codes in an element called rapporteurs
.
We then download the data set of ID codes and gender (is_female
) from the judges
data set, using the rapporteurs
object to filter out only relevant observations. Last, we use the match
function to add the is_female
variable (renamed as rapporteur_is_female
) into judgments
.
The match
function creates a string matching corresponding values in two different vectors; in this case, ID codes observed in two different data tables. If you for example want to insert variable var
from a table called data2
into another table called data1
, with both tables containing the ID code id
, it could be done using the following: data1$var <- data2$var[match(data1$id, data2$id)]
. For merging to data tables you might also be interested in checking out the merge
function.
# Gather ID codes of all judge rapporteurs observed in the data set
rapporteurs <-
unique(judgments$iuropa_judge_rapporteur_id)
# Download data from judges data set
judges <- download_data(
component="cjeu_database_platform",
table="judges",
filters = list(
iuropa_judge_id = rapporteurs
),
variables = c("iuropa_judge_id", "is_female")
)
# Insert variable rapporteur_is_female into judgments data frame
judgments$rapporteur_is_female <-
judges$is_female[match(
judgments$iuropa_judge_rapporteur_id,
judges$iuropa_judge_id)]
Now that we have the gender of the judge rapporteurs in the judgments of cases lodged in the 2010s, we can observe the percentage of women rapporteurs in the judgments of the Court:
# Get percentage distribution of male and female judge rapporteurs
table(judgments$rapporteur_is_female)/
nrow(judgments)*100
# Get percentage of male and female judge rapporteurs observed in the period
table(judges$is_female)/
nrow(judges)*100
We find that 19.4% of the judgments published in the 2010s had female judge rapporteurs. Furthermore, we observe that 18.6% of the judge rapporteurs active in the period were female, meaning that the under-representation of women in the rapporteur role is caused by a lack of women on the bench, but that the women who do serve as judge rapporteurs are no less active than their male counterparts.
2.2. Dealing with lists
In some variables, more than one value may be observed for the same observation. When this is the case they are presented as lists in text format, separated by comma (for most variables) or semicolon (for subject matter, which might themselves contain commas). The easiest way to filter these variables is to conduct a simple text search. This is done below using the grep
function, listing all ECLI numbers where at least one subject matter contains the search term "tax".
# Judgments with subject matter mentioning "tax"
tax_judgments <- judgments$ecli[
grep("tax",
judgments$list_subject_keywords,
ignore.case = TRUE)
]
The grep
function allows for the use of regular expressions, providing powerful search functionality. For example, a pipe symbol (|
) means "OR": a search for "tax|trade"
will return all observations where either tax or trade is mentioned.
Searching for procedure types in this way can be useful for identifying specific types of procedure using the list_procedures
variable in judgments
:
# Create binary variable indicating if a judgment is in an action for failure to fulfill obligations:
judgments$infringement <-
grepl("action for failure to fulfill obligations",
judgments$list_procedures)
Though useful in many cases, doing such a broad search in list variables does not always provide the best outcome. For example, you might want to find decisions where the subject matter is "Taxation" specifically, rather than merely mentioning taxation. The stringr
package provides useful tools for handling lists. Below, str_split
is used to create a list variable called subject_matter
.
# load package stringr.
# Install with install.packages("stringr") if missing.
library(stringr)
# Turn list of subject matters into real list
judgments$subject_matter <-
str_split(judgments$list_subject_keywords, "; ") # Replace with comma if comma separated list
Having the variable as a proper list enables a number of possibilities. Let's first create an object subject_matters
, listing all the unique subject matters observed in the data, and then use grep
to identify subject matters related to taxation.
The unique
function returns a vector of all unique values in an object; for example, unique(c("cat", "dog", "dog", "rabbit", "dog"))
returns c("cat", "dog", "rabbit")
.
# Observe unique subject matters:
subject_matters <- unique(unlist(judgments$subject_matter))
# Return subject matters related to tax
subject_matters[grep("tax",
subject_matters,
ignore.case = TRUE)]
We observe four different tax related policy areas: "Taxation", "Value added tax", "Indirect taxation", and "Internal taxation". Some policy areas are, however, more common than others; by using the table
function we can observe the number of observations in each one, and by filtering by the number of observations we can identify the most central subject matters observed in our judgments
data set.
# Create table of subject matters
subject_matter_table <- table(unlist(judgments$subject_matter))
# Identify the most common subject matters
# Include observations with more than 300 observations and sort by number of observations in decreasiong order.
sort(subject_matter_table[which(subject_matter_table > 300)],
decreasing = TRUE)
This shows that "Taxation" is the second most common subject matter, with 560 judgments in the observed period.
As "Taxation" is the only subject matter spelling taxation with an upper case T, a case-sensitive text search using grepl
would work just fine for this data set to identify all observations belonging to this subject matter without false positives. This might, however, not always be the case: For example, searching for "Euro" would return every subject matter mentioning the word "European" or "Europe", not just those relating to the currency.
One solution to this problem is found in the apply
family of functions. The following code adds a binary variable taxation
to the judgments
data set, observing whether Taxation is one of the policy areas of the judgment.
# Binary variable observing whether "Taxation" is a subject matter
judgments$taxation <- sapply(
judgments$subject_matter, function(y)
"Taxation" %in% y
)
This is a powerful starting point for working with lists. In this example, "Taxation" %in% y
can be replaced with any command producing a TRUE
or FALSE
outcome, where y
represents the contents of each individual row of subject_matter
. If you want to test a command before including it in the apply
function you can assign individual observations of judgments$subject_matter
to an object y
, and run it individually:
y <- judgments$subject_matter[1]
"Taxation" %in% y
This returns FALSE
, which is correct as "Taxation" is not one of the subject matters of the first row of the judgments
data set.
As within regular expressions, a pipe symbol can be used to define multiple alternative criteria, reading as "OR": TRUE | FALSE
returns TRUE
, as one of the two arguments are positive. In contrast, &
can be used to mean "AND": TRUE & FALSE
will return FALSE
, whereas TRUE & (TRUE | FALSE)
returns TRUE
. Finally, !
can be used to mean "NOT": !TRUE
equals FALSE
and vice versa.
The which
function is used to observe which elements of a string are TRUE
: for example, which(c(TRUE, TRUE, FALSE, FALSE, TRUE))
will return c(1, 2, 5)
. It can be used to identify the rows of a data set in which a given criterion is met.
Below, this logic is illustrated with a series of examples:
# Judgments related to indirect taxation OR internal taxation:
indirect_or_internal <- which(
sapply(
judgments$subject_matter, function(y)
"Indirect taxation" %in% y | "Internal taxation" %in% y
)
)
# Both Taxation AND Environment:
environment_tax <- which(
sapply(
judgments$subject_matter, function(y)
"Environment" %in% y & "Taxation" %in% y
)
)
# Taxation AND NOT Value added tax:
tax_not_vat <- which(
sapply(
judgments$subject_matter, function(y)
"Taxation" %in% y & !"Value added tax" %in% y
)
)
# Return number of judgments related to both taxation and environment:
length(environment_tax)
# Return the ECLI numbers of these cases:
judgments$ecli[environment_tax]
# Create subset of data containing only judgments related to taxation and environment
judgments_env_tax <- judgments[environment_tax,]
If you are seeking to create a list variable where, like in subject_matter
, there can be multiple values for each observation, this can be achieved by using the lapply
function instead of sapply
. We might, for example, be interested in creating a list variable observing the outcome of procedures in the list_procedures
variable. There might be several of these outcomes listed in the same judgment, and they are listed within a parenthesis after the procedure type, separated by comma.
A possible value is for example:
action for failure to fulfil obligations (successful, unfounded, inadmissible)
In order to create a list of outcomes we need to extract the text within the parenthesis, and split the list by comma. One way of doing so is as follows, taking advantage of pipes to perform actions in multiple steps within the lapply
function.
judgments$outcome <- lapply(
judgments$list_procedures, function(y)
y |>
str_extract("\\(.*\\)") |> # Extracts text within parantheses
str_remove_all("\\(|\\)") |> # Removes ( and )
str_split(", ") |> # Splits the list at commas
unlist() # Flattens said list to remove complexity
)
In the above example, the use of lapply
and pipes is combined with three functions from the stringr
package. First, str_extract
is used to extract text from a string. In this case, regular expressions is used to extract everything between the first and the last parentheses observed: In \\(.*\\)
, .
is used to indicate any character, whereas *
marks any number of repetitions of it. As parentheses carry a specific meaning in regular expressions, they are escaped with a double backslash (\\
) in order to be interpreted as characters rather than within the logic of regular expressions.
If faced with several occurrences of parenthesised text within the same text string, one could consider instead using str_extract_all
in combination with .*?
, which would extract all parentheses separately and present them as a list. The question mark indicates that rather than observing as many occurrences of any character as possible, the expression is to observe as few as possible, in this case ending as soon as the first end of a parenthesis is observed.
After extracting the parentheses, str_remove_all
is used to remove them, leaving only the content within them. Again, the parentheses are escaped with a double backslash, and |
is used within the regular expression to mean "or". Finally, str_split
is used to turn the comma-separated list into a proper list of outcomes, before unlist
turns the list into a vector in order to remove unnecessary complexity.
3. Analysing data
In this section I present some examples of how the data can be analysed, focusing on graphical representations of the data. I use the package ggplot2
, which must be installed if it has not been used before (install.packages("ggplot2")
).
3.1. Length of procedures over time
Let's say we are interested in the average length of preliminary reference procedures over time. In order to get to this, we would first need to download a data set of the judgments of the Court in preliminary reference procedures containing the length of its procedures:
# Download data set of preliminary rulings
judgments <- download_data(
component = "cjeu_database_platform",
table = "judgments",
filters = list(
is_preliminary_ruling = 1
),
variables = c(
"cjeu_case_id",
"ecli",
"decision_date",
"duration_days",
"list_subject_keywords",
"is_urgent_procedure"
)
)
In ggplot
, the variables defining the aesthetics of the graph are defined within the aes
function: In this first example, we use it to define what we want to be plotted on the X and Y axes. With these aesthetic functions defined, geom_smooth
is called upon to draw a line through the average values over time.
In order to ensure that decision_date
is interpreted as a date rather than as a set of characters, it is wrapped within the as.Date
function.
library(ggplot2)
ggplot(judgments,
aes(
x = as.Date(decision_date),
y = duration_days
)) +
geom_smooth()
This provides a very simple plot showing the average length of preliminary reference procedures before the Court of Justice over time:
Having produced a minimal working example, we might go on to make some improvements to the plot, such as defining the minimum and maximum values to start the Y axis at zero and labelling the axes in a more appealing way. Furthermore, we might add a third dimension to the plot, drawing several lines: Below, the figure is adapted to draw urgent procedures in a separate line with a different colour.
# Create character variable for urgent procedure.
# Ifelse works as follows:
# if "arguemnt 1 is TRUE" then "argument 2", else "argument 3".
judgments$urgent <-
ifelse(judgments$is_urgent_procedure == 1,
"Urgent procedure",
"Not urgent procedure")
ggplot(judgments,
aes(
x = as.Date(decision_date),
y = duration_days,
col = urgent
)) +
geom_smooth() +
coord_cartesian(ylim = c(0, 800), # define limits on y axis
expand = FALSE) + # start plot in 0
labs(x = "Date of decision",
y = "Procedure time (days)",
col = "") + # the lines need no further description
theme(legend.position = "top")
This produces the following graph, plotting urgent and non-urgent procedures separately and illustrating that urgent procedures successfully shorten procedure times in preliminary rulings with more than 400 days in most of the observed period.
3.2. Number of judgments in a policy area over time
Next, let's say we want to create a simple bar plot showing the number of environmental judgments over time. First, we download a data set of judgments by the Court of Justice and the General Court:
# Download data set of preliminary rulings
judgments <- download_data(
component = "cjeu_database_platform",
table = "judgments",
filters = list(
court = c("Court of Justice",
"General Court")
),
variables = c(
"cjeu_case_id",
"ecli",
"decision_date",
"court",
"list_subject_keywords"
)
)
Having downloaded the data, we can filter out environment cases and add a variable observing the year of the decision. We also make an additional variable observing whether the judgment relates to pollution, agriculture and fisheries, or public health. A shortcut is made by treating these categories as mutually exclusive, and five observations are found to be affected by this. This is not enough to influence the interpretation of the graph in a significant way, but it is nevertheless worth taking notice of.
# Year of judgment:
judgments$decision_date <-
as.Date(judgments$decision_date)
judgments$decision_year <-
as.numeric(format(judgments$decision_date, "%Y"))
# Subject matter as list:
judgments$subject_matter <-
str_split(judgments$list_subject_keywords, "; ")
# Data set of environment judgments:
x <- which(sapply(judgments$subject_matter, function(y)
"Environment" %in% y
))
environment <- judgments[x,]
# Type of environmental case:
environment$type_environment <- "Other"
environment$type_environment[
grep("Agriculture and Fisheries", environment$subject_matter)] <-
"Agriculture and Fisheries"
environment$type_environment[
grep("Public health", environment$subject_matter)] <-
"Public health"
environment$type_environment[
grep("Pollution", environment$subject_matter)] <-
"Pollution"
# Observe how many judgments fit in multiple categories:
sum(grepl("Agriculture and Fisheries", environment$subject_matter) &
grepl("Public health", environment$subject_matter) |
grepl("Agriculture and Fisheries", environment$subject_matter) &
grepl("Pollution", environment$subject_matter) |
grepl("Public health", environment$subject_matter) &
grepl("Pollution", environment$subject_matter))
# It is only five.
Having prepared the data, it's time to draw the plot. The geom_bar
function is used to draw a nice bar plot in ggplot
, with the fill colour used to indicate the different subcategories of subject matters we defined in the type_environment
variable. As we want the bar plot to simply count the number of observations in each year, no Y axis needs to be defined.
# Draw bar plot
ggplot(environment,
aes(
x = decision_year,
fill = type_environment
)) +
geom_bar() +
labs(x = "Year",
y = "Number of CJEU judgments on environment",
fill = "Type of issue:") +
theme(legend.position = "top")
This produces the following graph:
The figure shows an increasing number of environmental cases until around 2010, after since the number stabilised (and dropped slightly during COVID). While the number of judgments relating to pollution dropped since the early 2010s, environmental cases that are also seen as relating to public health has started popping up since the late 2000s. Agriculture and Fisheries, which was responsible for the first judgments of the CJEU categorised as relating to the environment, has become relatively sidelined, though a small number of these judgments are still made on a near yearly basis.
When using the subject matter variable it is important to keep in mind that this is based on the official metadata of the CJEU and the EU: It is likely that some cases might relate to the environment yet go under the radar of the official classification.
3.3. Age of appointed judges
The appointments
data set can be used to analyse the age of judges when they are appointed to the CJEU. Furthermore, we can compare the age of appointed judges according to a number of different criteria, such as the tribunal they are appointed to or the gender of the judge.
First, we need to download data on the appointments of judges. I chose to limit the selection to judges (excluding registrars and advocates general), and excluding the Civil Service Tribunal.
# Download judge appointments
appointments <- download_data(
component="cjeu_database_platform",
table = "appointments",
filter = list(
position = "judge",
court = c("Court of Justice",
"General Court")
),
variables = c(
"iuropa_judge_id",
"birth_year",
"is_female",
"member_state",
"court",
"start_date")
)
The next step is to calculate the age of the judges at the time of appointment. This will be approximate, as only the year and not the date of birth is included in the data set. We can calculate the age by first converting start_date
into a year by using a combination of as.Date
, format
, and as.numeric
. The age of the judge can then be calculated using simple subtraction.
# Convert start date to date format
appointments$start_date <- as.Date(appointments$start_date)
# Age at beginning of appointment
appointments$start_year <-
as.numeric(
format(
appointments$start_date,
"%Y" # Changes date to year (in character format)
)
)
appointments$age <-
appointments$start_year - appointments$birth_year
Having identified the age of the judges when they were first appointed, we may take a moment to consider possible problems with our empirical approach.
First, some judges might have multiple recorded appointments to the same tribunal, if they left the tribunal for a period of time for later to return to it. We might do well to remove these observations of repeated appointments to the same tribunal.
Second, the age of appointed judges might be subject to change over time. As the General Court (established 1989) is a younger tribunal than the Court of Justice (established 1952). While one way of controlling for this would be to plot the average age of appointed judges over time similar to what is done above for the length of procedures, I instead choose here to exclude judges appointed before the first appointments were made to the General Court on September 25, 1989.
# Sort in order of appointment
appointments <- appointments[order(appointments$start_date),]
# Remove observations of judges who returned to the same tribunal after a break
# This line keeps observations that are not duplicates of both judge ID and court.
appointments <-
appointments[which(!duplicated(
paste0(
appointments$iuropa_judge_id,
appointments$court
)
)
),]
# Identify the date of the first appointment made to the General Court
GC_start <- min(
appointments$start_date[
which(appointments$court == "General Court")
])
# Exclude observations prior to the first General Court appointment
appointments <- appointments[which(
appointments$start_date > GC_start
),]
# Add gender variable with text labels
appointments$gender <- ifelse(appointments$is_female == 1,
"Female",
"Male")
Having prepared the data, the only thing remaining is to draw the plots. A box plot is a useful way of representing the distribution of data, and will be used for this purpose here. I decide to plot the two courts on the X axis, the age of the judges on the Y axis, and to fill the boxes with colour according to gender.
ggplot(appointments,
aes(
x = court,
y = age,
fill = gender
)) +
geom_boxplot() +
labs(x = "Court",
y = "Age of judge",
fill = "Gender:") +
theme(legend.position = "top")
Box plots represent the distribution of observations. The thin vertical lines, known as whiskers, indicate the extreme observations: The minimum and maximum values of the distribution. The boxes themselves represent the span of observations between the 25th and 75th percentiles, meaning that half the observations will be within the boxes. The thick line within the boxes show the median values.
The figure shows clear differences both between the two courts and between female and male judges in the CJEU. Unsurprisingly, the General Court tends to receive younger judges than the Court of Justice. The extreme observations are, however, more widespread, which might in part be a consequence of the General Court having seen more appointments in total in the period due to the its expansion to two judges per member state.
Perhaps more interestingly, the women appointed to the CJEU tend to be younger than their male colleagues, and especially so in the Court of Justice: The median age of female appointments to the Court of Justice in the period is lower than the 25th percentile of male judges. That said, as there are only twelve women appointed to the Court of Justice in the period, one should be careful not to draw too bombastic conclusions based on this evidence.
4. Closing remarks
While I of course only scratch the surface here, I hope this helps to open the door to study the CJEU using empirical methods. My plan is to update this document as I hear from people who want help on any specific challenges, or based on any potential feedback. If you have any such feedback of anything you would like an introduction to when it comes to using the data on the CJEU, don't hesitate to send me an email! The same goes for if you need additional information that is not currently included in the IUROPA database, as I might be able to help you out.
For those interested in digging deeper into the world of R, I can only repeat my recommendation of R for Data Science.
If you do end up using the data from the IUROPA platform, please cite the CJEU Database Platform: Decisions and Decision-Makers. You can find the BibLaTex and RIS entries here.