Genomic Environment Predicts Expression Patterns

on the Human Inactive X Chromosome

 

Written by Chungoo Park

 

If you have any questions concerining the source codes, their usages, or need assistance, please feel free to contact me (cxp440@psu.edu).

 

 

v    Analyze I or E subgenomes in Xp22.

Step One: Define the Inactivated and Escape subgenome composition.

 

Extract gene lists from X-inactivation profile which came from a supplementary file in Carrel and Willard (2005) ÒX-inactivation profile reveals extensive variability in X-linked gene expression in femalesÓ Nature 17;434(7031):400-404. Genes were considered to be X inactivated if silenced in all nine inactive X containing hybrids assayed or if expressed in only a single hybrid (0/9 or 1/9); they will be a candidate members for I subgenome. Similarly, genes were scored as escaping XCI if expressed in eight or nine out of nine inactivate X hybrids tested (8/9 or 9/9); they will be used for E subgenome. Note especially to analyze the entire X chromsome, ESTs were not considered to make I or E subgenome.

 

1. Create two files (ÒWhole_X_new_subject_list.txtÓ and ÒWhole_X_new_escape_ 

    list.txtÓ). Each file has geneÕs profile which met the each criteria.

 

   * Source Code: Make_list_for_subject_escape.pl   

 

#!/usr/bin/perl -w

 

use strict;

use warnings;

 

my $file = "../../SourceData/Expression_X.txt";

my $Out_Subject = "Whole_X_new_subject_list.txt";

my $Out_Escape = "Whole_X_new_escape_list.txt";

 

my $cLine_File;

 

open (FILE, "<$file")     ||                die "Sorry, I couldn't open the After_lists_known.txt file: $!\n";

while (defined($cLine_File = <FILE>)){

                  chomp($cLine_File);

                 

                  if (!($cLine_File =~ /Pseudoautosomal/)){

                                   

                                    if ($cLine_File =~ /0 \/ 9/){

                                                      open (SUB, ">>$Out_Subject")      ||                die "Sorry, I couldn't open the Whole_X_subject_list.txt file: $!\n";

                                                      print SUB "$cLine_File\n";

                                                      close (SUB);

                                    }

 

                                    if ($cLine_File =~ /1 \/ 9/){

                                                      open (SUB, ">>$Out_Subject")      ||                die "Sorry, I couldn't open the Whole_X_subject_list.txt file: $!\n";

                                                      print SUB "$cLine_File\n";

                                                      close (SUB);

                                    }

 

                                    if ($cLine_File =~ /8 \/ 9/){

                                                      open (ESC, ">>$Out_Escape")       ||                die "Sorry, I couldn't open the Whole_X_escape_list.txt file: $!\n";

                                                      print ESC "$cLine_File\n";

                                                      close (ESC);

                                    }

 

                                    if ($cLine_File =~ /9 \/ 9/){

                                                      open (ESC, ">>$Out_Escape")       ||                die "Sorry, I couldn't open the Whole_X_escape_list.txt file: $!\n";

                                                      print ESC "$cLine_File\n";

                                                      close (ESC);

                                    }

                  }

}

close(FILE);

 

2. Divide an input file (ÒWhole_X_new_subject_list.txtÓ or ÒWhole_X_new_escape_ 

list.txtÓ) into two files (ÒWhole_X_new_subject_list_none_ESTs.txtÓ and ÒWhole _X_new_subject_list_Ests.txtÓ OR ÒWhole_X_new_escape_list_none_ESTs.txtÓ and ÒWhole_X_new_escape_list_Ests.txtÓ) by the presence or absence of EST.

 

   * Source Code: Make_list_by_EST.pl   

 

#!/usr/bin/perl -w

 

use strict;

use warnings;

 

my $file = "Whole_X_new_escape_list.txt";

my $Out_None_Ests = "Whole_X_new_escape_list_none_ESTs.txt";

my $Out_Ests = "Whole_X_new_escape_list_Ests.txt";

 

my $cLine_File;

 

open (FILE, "<$file")     ||                die "Sorry, I couldn't open the input file: $!\n";

while (defined($cLine_File = <FILE>)){

                  chomp($cLine_File);

                 

                                   

                  if ($cLine_File =~ /EST/){

                                    if ($cLine_File =~ /EST support/){

                                                      open (NON_EST, ">>$Out_None_Ests")        ||                die "Sorry, I couldn't open the output_none_EST file: $!\n";

                                                      print NON_EST "$cLine_File\n";

                                                      close (NON_EST);

                                    }else{

                                                      open (EST, ">>$Out_Ests")            ||                die "Sorry, I couldn't open the output_EST file: $!\n";

                                                      print EST "$cLine_File\n";

                                                      close (EST);

                                    }

                  }else{

                                    open (NON_EST, ">>$Out_None_Ests")        ||                die "Sorry, I couldn't open the output_none_EST file: $!\n";

                                    print NON_EST "$cLine_File\n";

                                    close (NON_EST);

                  }

 

}

close(FILE);

 

3. Build three different sized subgenomes (100kb, 200kb and 500kb) surrounding the transcription start sites (TSSs) of genes. Note that the 100/200/500kb contigs include 50/100/250kb upstream and downstream of the TSS.

 

         * Source Code for I subgenome: Make_Subgenome_subject_against_WholeX.pl

            - Output: Subgenome_Subject_50K_wo_EST.txt or

                            Subgenome_Subject_100K_wo_EST.txt or

                            Subgenome_Subject_250K_wo_EST.txt

 

#!/usr/bin/perl -w

 

use strict;

use warnings;

 

my $subject = "Whole_X_new_subject_list_none_ESTs.txt";

my $escape = "Whole_X_new_escape_list_none_ESTs.txt";

 

my %unManaged = ();

my %Managed = ();

my $sequence_length = 250000;

 

sub check_escape{

                  my $func_start_seq = $_[0];

                  my $func_end_seq = $_[1];

 

                  open (CLON_ESC, "< $escape")     ||

                                    die "Sorry, I couldn't open the escape.txt for clone: $!\n";

                  my $func_cLine;

                  my ($esc_start_seq, $esc_end_seq);

                  while (defined($func_cLine = <CLON_ESC>)){

 

                                    my @func_Database = $func_cLine =~ /^(\S+) - (\S+)\t(\S*|\s*)\t/;

                                    my $func_gene_start = $func_Database[0];

                                    my $func_gene_end = $func_Database[1];

                                    my $func_strand = $func_Database[2];

 

                                    if ($func_strand eq "-"){

                                                      $esc_start_seq = $func_gene_end - $sequence_length;

                                                      $esc_end_seq = $func_gene_end + $sequence_length;

                                    }else{

                                                      $esc_start_seq = $func_gene_start - $sequence_length;

                                                      $esc_end_seq = $func_gene_start + $sequence_length;

                                    }

 

                                    if (($func_start_seq >= $esc_start_seq) && ($func_start_seq < $esc_end_seq)){

                                                      return 0;

                                    }

                                    if (($func_end_seq > $esc_start_seq) && ($func_end_seq <= $esc_end_seq)){

                                                      return 0;

                                    }

                  }

                  return 1;

                  close(CLON_ESC);

}

 

sub merge_seq{

                  my $func_first = $_[0];

                  my $func_end = $_[1];

                  my ($begin, $end);

                  delete $unManaged{$func_first};

                  while (($begin, $end) = each(%unManaged)){

                                    if (($func_first < $begin) && ($func_end > $begin)){

                                                      $func_end = $end;

                                                      delete $unManaged{$begin};

                                    }                

                                    if (($func_first < $end) && ($func_end > $end)){

                                                      $func_first = $begin;

                                                      delete $unManaged{$begin};

                                    }

                  }

                  $Managed{$func_first} = $func_end;

}

 

open (SUBJECT, "< $subject")        ||

                  die "Sorry, I couldn't open the subject.txt: $!\n";

 

my $cLine;

my @Database = ();

my ($first, $last);

 

while (defined($cLine = <SUBJECT>)){

                  my ($start_seq, $end_seq);

                  @Database = $cLine =~ /^(\S+) - (\S+)\t(\S*|\s*)\t/;

                  my $gene_start = $Database[0];

                  my $gene_end = $Database[1];

                  my $strand = $Database[2];

 

                  if ($strand eq "-"){

                                    $start_seq = $gene_end - $sequence_length;

                                    $end_seq = $gene_end + $sequence_length;

                  }else{

                                    $start_seq = $gene_start - $sequence_length;

                                    $end_seq = $gene_start + $sequence_length;

                  }

                 

                  my $result_escape = check_escape($start_seq, $end_seq);

                  if ($result_escape eq "1"){