Genomic Environment Predicts
Expression Patterns
on the Human Inactive X
Chromosome
Written by Chungoo Park
If you have any questions concerining the
source codes, their usages, or need assistance, please feel free to contact me
(cxp440@psu.edu).
v Analyze I or E subgenomes in Xp22.
Step
One: Define the Inactivated
and Escape subgenome composition.
Extract
gene lists from X-inactivation profile which came from a supplementary file in
Carrel and Willard (2005) ÒX-inactivation profile reveals extensive variability
in X-linked gene expression in femalesÓ Nature 17;434(7031):400-404. Genes were
considered to be X inactivated if silenced in all nine inactive X containing
hybrids assayed or if expressed in only a single hybrid (0/9 or 1/9); they will
be a candidate members for I subgenome. Similarly, genes were scored as
escaping XCI if expressed in eight or nine out of nine inactivate X hybrids
tested (8/9 or 9/9); they will be used for E subgenome. Note especially to
analyze the entire X chromsome, ESTs were not considered to make I or E
subgenome.
1.
Create two files (ÒWhole_X_new_subject_list.txtÓ and ÒWhole_X_new_escape_
list.txtÓ). Each file has geneÕs profile which met the
each criteria.
*
Source Code: Make_list_for_subject_escape.pl
#!/usr/bin/perl
-w
use
strict;
use
warnings;
my
$file = "../../SourceData/Expression_X.txt";
my
$Out_Subject = "Whole_X_new_subject_list.txt";
my
$Out_Escape = "Whole_X_new_escape_list.txt";
my
$cLine_File;
open
(FILE, "<$file") || die
"Sorry, I couldn't open the After_lists_known.txt file: $!\n";
while
(defined($cLine_File = <FILE>)){
chomp($cLine_File);
if
(!($cLine_File =~ /Pseudoautosomal/)){
if
($cLine_File =~ /0 \/ 9/){
open
(SUB, ">>$Out_Subject") || die
"Sorry, I couldn't open the Whole_X_subject_list.txt file: $!\n";
print
SUB "$cLine_File\n";
close
(SUB);
}
if
($cLine_File =~ /1 \/ 9/){
open
(SUB, ">>$Out_Subject") || die
"Sorry, I couldn't open the Whole_X_subject_list.txt file: $!\n";
print
SUB "$cLine_File\n";
close
(SUB);
}
if
($cLine_File =~ /8 \/ 9/){
open
(ESC, ">>$Out_Escape") || die
"Sorry, I couldn't open the Whole_X_escape_list.txt file: $!\n";
print
ESC "$cLine_File\n";
close
(ESC);
}
if
($cLine_File =~ /9 \/ 9/){
open
(ESC, ">>$Out_Escape") || die
"Sorry, I couldn't open the Whole_X_escape_list.txt file: $!\n";
print
ESC "$cLine_File\n";
close
(ESC);
}
}
}
close(FILE);
2.
Divide an input file (ÒWhole_X_new_subject_list.txtÓ or ÒWhole_X_new_escape_
list.txtÓ) into two files (ÒWhole_X_new_subject_list_none_ESTs.txtÓ
and ÒWhole _X_new_subject_list_Ests.txtÓ OR
ÒWhole_X_new_escape_list_none_ESTs.txtÓ and ÒWhole_X_new_escape_list_Ests.txtÓ)
by the presence or absence of EST.
*
Source Code: Make_list_by_EST.pl
#!/usr/bin/perl
-w
use
strict;
use
warnings;
my
$file = "Whole_X_new_escape_list.txt";
my
$Out_None_Ests = "Whole_X_new_escape_list_none_ESTs.txt";
my
$Out_Ests = "Whole_X_new_escape_list_Ests.txt";
my
$cLine_File;
open
(FILE, "<$file") || die
"Sorry, I couldn't open the input file: $!\n";
while
(defined($cLine_File = <FILE>)){
chomp($cLine_File);
if
($cLine_File =~ /EST/){
if
($cLine_File =~ /EST support/){
open
(NON_EST, ">>$Out_None_Ests") || die
"Sorry, I couldn't open the output_none_EST file: $!\n";
print
NON_EST "$cLine_File\n";
close
(NON_EST);
}else{
open
(EST, ">>$Out_Ests") || die
"Sorry, I couldn't open the output_EST file: $!\n";
print
EST "$cLine_File\n";
close
(EST);
}
}else{
open
(NON_EST, ">>$Out_None_Ests") || die
"Sorry, I couldn't open the output_none_EST file: $!\n";
print
NON_EST "$cLine_File\n";
close
(NON_EST);
}
}
close(FILE);
3.
Build three different sized subgenomes (100kb, 200kb and 500kb) surrounding the
transcription start sites (TSSs) of genes. Note that the 100/200/500kb contigs
include 50/100/250kb upstream and downstream of the TSS.
* Source Code for I subgenome: Make_Subgenome_subject_against_WholeX.pl
- Output: Subgenome_Subject_50K_wo_EST.txt or
Subgenome_Subject_100K_wo_EST.txt or
Subgenome_Subject_250K_wo_EST.txt
#!/usr/bin/perl
-w
use
strict;
use
warnings;
my
$subject = "Whole_X_new_subject_list_none_ESTs.txt";
my
$escape = "Whole_X_new_escape_list_none_ESTs.txt";
my
%unManaged = ();
my
%Managed = ();
my
$sequence_length = 250000;
sub
check_escape{
my
$func_start_seq = $_[0];
my
$func_end_seq = $_[1];
open
(CLON_ESC, "< $escape") ||
die
"Sorry, I couldn't open the escape.txt for clone: $!\n";
my
$func_cLine;
my
($esc_start_seq, $esc_end_seq);
while
(defined($func_cLine = <CLON_ESC>)){
my
@func_Database = $func_cLine =~ /^(\S+) - (\S+)\t(\S*|\s*)\t/;
my
$func_gene_start = $func_Database[0];
my
$func_gene_end = $func_Database[1];
my
$func_strand = $func_Database[2];
if
($func_strand eq "-"){
$esc_start_seq
= $func_gene_end - $sequence_length;
$esc_end_seq
= $func_gene_end + $sequence_length;
}else{
$esc_start_seq
= $func_gene_start - $sequence_length;
$esc_end_seq
= $func_gene_start + $sequence_length;
}
if
(($func_start_seq >= $esc_start_seq) && ($func_start_seq <
$esc_end_seq)){
return
0;
}
if
(($func_end_seq > $esc_start_seq) && ($func_end_seq <=
$esc_end_seq)){
return
0;
}
}
return
1;
close(CLON_ESC);
}
sub
merge_seq{
my
$func_first = $_[0];
my
$func_end = $_[1];
my
($begin, $end);
delete
$unManaged{$func_first};
while
(($begin, $end) = each(%unManaged)){
if
(($func_first < $begin) && ($func_end > $begin)){
$func_end
= $end;
delete
$unManaged{$begin};
}
if
(($func_first < $end) && ($func_end > $end)){
$func_first
= $begin;
delete
$unManaged{$begin};
}
}
$Managed{$func_first}
= $func_end;
}
open
(SUBJECT, "< $subject") ||
die
"Sorry, I couldn't open the subject.txt: $!\n";
my
$cLine;
my
@Database = ();
my
($first, $last);
while
(defined($cLine = <SUBJECT>)){
my
($start_seq, $end_seq);
@Database
= $cLine =~ /^(\S+) - (\S+)\t(\S*|\s*)\t/;
my
$gene_start = $Database[0];
my
$gene_end = $Database[1];
my
$strand = $Database[2];
if
($strand eq "-"){
$start_seq
= $gene_end - $sequence_length;
$end_seq
= $gene_end + $sequence_length;
}else{
$start_seq
= $gene_start - $sequence_length;
$end_seq
= $gene_start + $sequence_length;
}
my
$result_escape = check_escape($start_seq, $end_seq);
if
($result_escape eq "1"){